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Title: Novel atypical pneumonia-causing virus 

5 

The invention relates to the field of virology. 

The SARS outbreak of 2002-2003 has prompted a search for related viruses that 
may have previously caused atypical pneumonias or that may do so in the future. A 
1 0 respiratory illness (atypical pneumonia) was diagnosed in an 8 months old patient that 
could not be attributed to SARS (Severe Acute Respiratory Syndrome) virus or any othe 
known viral infection. The patient tested negative for influenza, parainfluenza, mumps 
and RSV and yet the disease was identified to be caused by a virus which closely 
resembled SARS. 

1 5 For being able to trace its origin, monitor its epidemiology and prevent possible 

spreading of the disease, it is of great importance to be able to recognise viral causes of 
pneumonia in an early stage. Especially, if severe diseases are found to be caused by 
viruses, it is necessary to detect the identity of the virus as soon as possible, in order to 
develop diagnostic tools and possibly therapies. The SARS epidemy has shown that it is 

2 0 paramount for prevention of spread of the disease to be able to get an early diagnosis in 
order to take timely and effective isolation measures and initiate quarantine 
precautions. Only then, world-wide contaminations can be prevented. 

Furthermore, identification of the viral cause for the disease enables 
development of vaccines, which can be used prophylactically to protect people who are a 

2 5 risk of being infected. And, finally, knowledge of the viral cause enables to develop 

therapeutic measures. 

Thus, there is great need in developing diagnostic tools and therapies for viral 
pneumonias in general, and particular to a novel disease-causing infectious agent, 
especially when this agent appears to be a virus. 

3 0 The invention provides the nucleotide sequence of an isolated essentially 

mammalian positive-sense single stranded RNA virus belonging to the Coronaviruses, 
which is the causative factor for the new disease, hereinafter referred to as EMCR-CoV 
and the disease being referred to as EMCR-CoV-caused pneumonia. A virus according tc 
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the invention is isolatable from a human with respiratory tract disease such as, but not 
limited to, atypical pneumonia. 

From a phylogenetic analysis of the Matrix and Nucleocapsid gene sequences of 
the virus (Fig, la and lb) it appears that the virus is a distinct member of the group 
formed by PEDV (porcine epidemic diarrhea virus), HCoV-229E (human coronavirus 
229E), PRCoV (porcine respiratory coronavirus), TGEV (transmissible gastroenteritis 
virus), CaCoV (Canine coronavirus) and FeCoV (feline coronavirus). In general, human 
coronavirus 229E seems to be the closest relative (at least for the Matrix and 
Nucleocapsid proteins). 

Although phylogenetic analyses provide a convenient method of identifying a 
virus, several other possibly more straightforward albeit somewhat more coarse 
methods for identifying said virus or viral proteins or nucleic acids from said virus are 
herein also provided. As a rule of thumb an EMCR-Coronavirus can be identified by the 
percentages of homology of the virus, proteins or nucleic acids to be identified in 
comparison with viral proteins or nucleic acids identified herein by sequence. It is 
generally known that virus species, especially RNA virus species, often constitute a 
quasi species wherein a cluster of said viruses displays heterogeneity among its 
members. Thus it is expected that each isolate may have a somewhat different 
percentage relationship with the sequences of the isolate as provided herein. 

When one wishes to compare a virus isolate with the sequences as listed in figure 
3, the invention provides an isolated essentially mammalian positive-sense single 
stranded RNA virus (EMCR-CoV) belonging to the Coronaviruses and identifiable as 
phylogenetically corresponding thereto by detennining a nucleic acid sequence of said 
virus and determining that said nucleic acid sequence has a percentage nucleic acid 
identity to the sequences as listed higher than the percentages identified herein for the 
nucleic acids as identified herein below in comparison with PEDV, 229E, PRCoV, TGEV 
CaCoV and FeCoV. Likewise, an isolated essentially mammalian positive-sense single 
stranded RNA virus (EMCR-CoV) belonging to the Coronaviruses and identifiable as 
phylogenetically corresponding thereto by determining an amino acid sequence of said 
virus and determining that said amino acid sequence has a percentage amino acid 
homology to the sequences as listed which is essentially higher than the percentages 
provided herein in comparison with PEDV, 229E, PRCoV, TGEV, CaCoV and FeCoV. 

With the provision of the sequence information of this EMCR-Coronavirus 
(EMCR-CoV), the invention provides diagnostic means and methods, prophylactic mean 
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and methods and therapeutic means and methods to be employed in the diagnosis 
prevention and/or treatment of disease, in particular of respiratory disease (atypical 
pneumonia), in particular of mammals, more in particular in humans associated with 
infection by this virus. In virology, it is most advisory that diagnosis, prophylaxis and/o 
5 treatment of a specific viral infection is performed with reagents that are most specific 
for said specific virus causing said infection. In this case this means that it is preferred 
that said diagnosis, prophylaxis and/or treatment of an EMCR-CoV virus infection is 
performed with reagents that are most specific for EMCR-CoV virus. This by no means 
however excludes the possibility that less specific, but sufficiently cross-reactive 

10 reagents are used instead, for example because they are more easily available and 
sufficiently address the task at hand. 

The invention for example provides a method for virologically diagnosing an 
EMCR-CoV infection of an animal, in particular of a mammal, more in particular of a 
human being, comprising determining in a sample of said animal the presence of a viral 

15 isolate or component thereof by reacting said sample with an EMCR-CoV specific nuclei 
acid or antibody according to the invention, and a method for serologically diagnosing w 
, EMCR-CoV infection of a mammal comprising determining in a sample of said mammal 
the presence of an antibody specifically directed against an EMCR-CoV virus or 
component thereof by reacting said sample with an EMCR-CoV virus-specific 

2 0 proteinaceous molecule or fragment thereof or an antigen according to the invention. 
The invention also provides a diagnostic kit for diagnosing an EMCR-CoV 
infection comprising an EMCR-CoV virus, an EMCR-CoV virus-specific nucleic acid, 
proteinaceous molecule or fragment thereof, antigen and/or an antibody according to the 
invention, and preferably a means for detecting said EMCR-CoV virus, EMCR-CoV 

2 5 virus-specific nucleic acid, proteinaceous molecule or fragment thereof, antigen and/or 

an antibody, said means for example comprising an excitable group such as a 
fluorophore or enzymatic detection system used in the art (examples of suitable 
diagnostic kit format comprise IF, ELISA, neutralization assay, RT-PCR assay). To 
determine whether an as yet unidentified virus component or synthetic aiialogue thereoi 

3 0 such as nucleic acid, proteinaceous molecule or fragment thereof can be identified as 

EMCR-CoV-virus-specific, it suffices to analyse the nucleic acid or amino acid sequence 
of said component, for example for a stretch of said nucleic acid or amino acid, 
preferably of at least 10, more preferably at least 25, more preferably at least 40 
nucleotides or amino acids (respectively), by sequence homology comparison with the 
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provided EMCR-CoV viral sequences and with known non-EMCR-CoV viral sequences 
(human coronavirus 299E is preferably used) using for example phylogenetic analyses as 
provided herein. Depending on the degree of relationship with said EMCK-CoV or non- 
EMCR-CoV viral sequences, the component or synthetic analogue can be identified. 
5 The invention thus provides the nucleotide sequence of a novel etiological agent, 

an isolated essentially mammalian positive-sense single stranded RNA virus (herein 
also called EMCRrCoV virus) belonging to the Coronaviridae family, and EMCR-CoV 
virus-specific components or synthetic analogues thereof. 

Coronaviruses were first isolated from chickens in 1937, while the first human 
10 coronavirus was propagated in vitro by Tyrell and Bonoe in 1965. There are now about 
13 species in this family, which infect cattle, pigs, rodents, cats, dogs, birds and man. 
Coronavirus particles are irregularly shaped, about 60-220 nm in diameter, with an 
outer envelope bearing distinctive, 'club-shaped' peplomers ( about 20 nm long and 10 
nm wide at the distal end). This 'crown-like 1 appearance give the family its name. The 
15 envelope carries two glycoproteins: S, the spike glycoprotein which is involved in cell 
fusion and is a major antigen, and M, the membrane glycoprotein, which is involved in 
budding and envelope formation. The genome is associated with a basic phosphoprotein, 
designated N. The genome of coronaviruses, a single stranded positive-sense RNA 
strand, is typically 27-31 Kb long and contains a 5 1 methylated cap and a 3' poly-A tail, 
20 by which it can directly function as an mRNA in the infected cell. Initially the 5 1 ORF 1 
(about 20 Kb) is translated to produce a viral polymerase, which then produces a full 
length negative sense strand. This is used as a template to produce mRNA as a 'nested 
set 1 of transcripts, all with identical 5' non-translated leaider sequence of 72 nucleotides 
and coincident 3' polyadenylated ends. Each mRNA thus produced is monocistronic, the 

2 5 genes at the 5' end being translated from the longest mRNA and so on. These unusual 

cytoplasmic structures are produced not by splicing, but by the polymerase during 
transcription. Between each of the genes there is a repeated intergenic sequence - 
AACUAAAC - which interacts with the transcriptase plus cellular factors to splice the 
leader sequence onto the start of each ORF. In some coronaviruses there are about 8 

3 0 ORFs, coding for the proteins mentioned above, but also for a heamagglutenin esterase 

(HE), and several other non-structural proteins. 

Newly isolated viruses are phylogenetically corresponding to and thus 
taxonomically corresponding to EMCR-CoV virus when comprising a gene order and/or 
amino acid sequence and/or nucleotide sequence sufficiently similar to our prototypic 
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EMCR-CoV virus. The highest amino acid sequence homology, between EMCR-CoV 
virus and any of the known other viruses of the same family to date (human coronavin 
299E or Porcine Epidemic Diarrhea Virus ) is for parts of the replicase polyprotein lab 
80-83% (see, for example Fig. 3 sequences D and E; the % homology, and the virus to 
5 which the homology is found depend on the region of the replicase that is examined), at 
can be deduced when comparing the sequences given in figure 3 with sequences of othe 
viruses, in particular of human coronavirus 299E. Individual proteins or whole virus 
isolates with, respectively, higher homology than these mentioned maximum values ar 
considered phylogenetically corresponding and thus taxonomically corresponding to 
10 EMCR-CoV virus, and generally will be encoded by a nucleic acid sequence structural^ 
corresponding with a sequence as shown in figure 3. Herewith the invention provides j 
virus phylogenetically corresponding to the isolated virus of which the sequences are 
depicted in figure 3. 

It should be noted that, similar to other viruses, a certain degree of variation ca; 
15 be expected to be found between EMCR-CoV-viruses isolated from different sources. 

Also, the viral sequence of the EMCR-CoV virus or an isolated EMCR-CoV virus 
gene as provided herein for example shows less than 95%, preferably less than 90%, 
more preferably less than 80%, more preferably less than .70% and most preferably less 
than 65% nucleotide sequence homology or less than 95%, preferably less than 90%, 

2 0 more preferably less than 80%, more preferably less than 70% and most preferably less 

than 65% amino acid sequence homology with the respective nucleotide or amino acid 
sequence of the human coronavirus 299E or Porcine Epidemic Diarrhea Virus as for 
example can be found in Genbank (for example in accession number a£304460 (HCoV- 
299E) or af353511 (PEDV). 

25 Sequence divergence of EMCR-CoV strains around the world may be somewhat 

higher, in analogy with other coronaviruses. 

The term "nucleotide sequence homology" as used herein denotes the presence of 
homology between two (polynucleotides. Polynucleotides have "homologous" sequences 
mhe sequence of nuiae^d^m ^~^'s^ence8''&'l&e same when' aligned for 

3 0 maximum correspondence. Sequence comparison between two or more polynucleotides i 

generally performed by comparing portions of the two sequences over a comparison 
window to identify and compare local regions of sequence similarity. The comparison 
window is generally from about 20 to 200 contiguous nucleotides. The "percentage of 
sequence homology" for polynucleotides, such as 50, 60, 70, 80, 90, 95, 98, 99 or 100 
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percent sequence homology may be determined by comparing two optimally aligned 
sequences over a comparison window, wherein the portion of the polynucleotide 
sequence in the comparison window may include additions or deletions (i.e. gaps) as 
compared to the reference sequence (which does not comprise additions or deletions) for 
optimal alignment of the two sequences. The percentage is calculated by: (a) 
determining the number of positions at which the identical nucleic acid base occurs in 
both sequences to yield the number of matched positions; (b) dividing the number of 
matched positions by the total number of positions in the window of comparison; and (c) 
multiplying the result by 100 to yield the percentage of sequence homology. Optimal 
alignment of sequences for comparison may be conducted by computerized 
implementations of known algorithms, or by inspection. Readily available sequence 
comparison and multiple sequence alignment algorithms are, respectively, the Basic 
Local Alignment Search Tool (BLAST) (Altschul, S.F. et al. 1990. J. Mol. Biol. 215:403; 
Altschul, S.F. et al. 1997. Nucleic Acid Res. 25:3389-3402) and ClustalW programs both 
available on the internet. Other suitable programs include GAP, BESTFIT and FASTA 
in the Wisconsin Genetics Software Package (Genetics Computer Group (GCG), 
Madison, WI, USA). 

As used herein, "substantially complementary" means that two nucleic add sequences 
have at least about 65%, preferably about 70%, more preferably about 80%, even more 
preferably 90%, and most preferably about 98%, sequence complementarity to each 
other. This means that the primers and probes must exhibit sufficient complementarity 
to their template and target nucleic acid, respectively, to hybridise under stringent 
conditions. Therefore, the primer sequences as disclosed in this specification need not 
reflect the exact sequence of the binding region on the template and degenerate primers 
can be used. A substantially complementary primer sequence is one that has sufficient 
sequence complementarity to the amplification template to result in primer binding and 
second-strand synthesis. 

The term "hybrid" refers to a double-stranded nucleic acid molecule, or duplex, 
formed by hydrogen bonding between complementary nucleotides. The terms "hybridise" 
or "anneal" refer to the process by which single strands of nucleic acid sequences form 
double-helical segments through hydrogen bonding between complementary nucleotides. 

The term "oligonucleotide" refers to a short sequence of nucleotide monomers 
(usually 6 to 100 nucleotides) joined by phosphorous linkages (e.g., phosphodiester, alkyl 
and aryl-phosphate, phosphorothioate), or non-phosphorous linkages (e.g., peptide, 
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sulfamate and others). An oligonucleotide may contain modified nucleotides having 
modified bases (e.g., 5-methyl cytosine) and modified sugar groups (e.g., 2'-0-methyl 
ribosyl, 2'-0-methoxyethyl ribosyl, 2'-fluoro ribosyl, 2'-amino ribosyl, and the like). 
Oligonucleotides may be naturally-occurring or synthetic molecules of double- and 
5 single-stranded DNA and double- and single-stranded RNA with circular, branched or 
linear shapes and optionally including domains capable of forming stable secondary 
structures (e.g., stem-and-loop and loop-stem-loop structures). 

The term "primer" as used herein refers to an oligonucleotide which is capable o 
annealing to the amplification target allowing a DNA polymerase to attach thereby 

10 serving as a point of initiation of DNA synthesis when placed under conditions in whicl 
synthesis of primer extension product which is complementary to a nucleic acid strand 
is induced, i.e., in the presence of nucleotides and an agent for polymerization such as 
DNA polymerase and at a suitable temperature and pH. The (amplification) primer is 
preferably single stranded for maximum efficiency in amphfication. Preferably, the 

15 primer is an oligodeoxy ribonucleotide. The primer must be sufficiently long to prime tt 
synthesis of extension products in the presence of the agent for polymerization. The 
exact lengths of the primers will depend on many factors, including temperature and 
source of primer. A "pair of W-directional primers" as used herein refers to one forward 
and one reverse primer as commonly used in the art of DNA amplification such as in 

20 PCR amphfication. 

The term "probe" refers to a single-stranded oligonucleotide sequence that will 
recognize and form a hydrogen-bonded duplex with a complementary sequence in a 
target nucleic acid sequence analyte or its cDNA derivative. 

The terms "stringency" or "stringent hybridization conditions" refer to 
hybridization conditions that affect the stability of hybrids, e.g„ temperature, salt 
concentration, pH, formamide concentration and the like. These conditions are 
empirically optimised to maximize specific binding and minimize non-specific binding oi 
primer or probe to its target nucleic acid sequence. The terms as used include reference 
to conditions under which a probe or primer will hybridise to its target sequence, to a 
detectably greater degree than other sequences (e.g. at least 2-fold over background). 
Stringent conditions are sequence dependent and will be different in different 
circumstances. Longer sequences hybridise specifically at higher temperatures. 
Generally, stringent conditions are selected to be about 5°C lower than the thermal 
melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm 
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is the temperature (under defined ionic strength and pH) at which 50% of a 
complementary target sequence hybridises to a perfectly matched probe or primer. 
Typically, stringent conditions will be those in which the salt concentration is less than 
about 1.0 M Na+ ion, typically about 0.01 to 1.0 M Na+ ion concentration (or other salts) 
at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes or primers 
(e.g. 10 to 50 nucleotides) and at least about 60°C for long probes or primers (e.g. greater 
than 50 nucleotides). Stringent conditions may also be achieved with the addition of 
destabilizing agents such as formamide. Exemplary low stringent conditions or 
"conditions of reduced stringency" include hybridization with a buffer solution of 30% 
formamide, 1 M NaCl, 1% SDS at 37°C and a wash in 2x SSC at 40°C. Exemplary high 
stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 
37°C, and a wash in O.lx SSC at 60°C. Hybridization procedures are well known in the 
art and are described in e.g. Ausubel et al, Current Protocols in Molecular Biology, John 
Wiley & Sons Inc., 1994. 

The term "antibody" includes reference to antigen binding forms of antibodies (e. 
g., Fab, F (ab) 2). The term "antibody 11 .frequently refers to a polypeptide substantially 
encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof 
which specifically bind and recognize an analyte (antigen). However, while various 
antibody fragments can be defined in terms of the digestion of an intact antibody, one of 
skill will appreciate that such fragments may be synthesized de novo either chemically 
or by utilizing recombinant DNA methodology. Thus, the term antibody, as used herein, 
also includes antibody fragments such as single chain Fv, chimeric antibodies (i. e., 
comprising constant and variable regions from dilferent species), humanized antibodies 
(i. e., comprising a complementarity determining region (CDR) from a non-human 
source) and heteroconjugate antibodies (e. g., bispecific antibodies). 

In short, the invention provides an isolated essentially mammalian positive- 
sense single stranded RNA virus (EMCRrCoV) belonging to the Coronaviruses and 
identifiable as phylogenetically corresponding thereto by determining a nucleic acid 
sequence of a suitable fragment of the genome of said virus and testing it in 
phylogenetic tree analyses wherein maximum likelihood trees are generated using 100 
bootstraps and 3 jumbles and finding it to be more closely phylogenetically 
corresponding to a virus isolate having the sequences as depicted in figure 3 than it is 
corresponding to a virus isolate of PEDV (porcine epidemic diarrhea virus), HCoV-229E 
(human coronavirus 229E), PRCoV (porcine respiratory coronavirus), TGEV 



(transmissible gastroenteritis virus), CaCoV (Canine coronavirus) and FeCoV (feline 
coronavirus). 

Suitable nucleic acid genome fragments each useful for such phylogenetic tree 
analyses are for example any of the fragments encoding the Matrix protein or the 
Nucleocapsid protein as disclosed in figure 3, leading to the phylogenetic tree analysis 
as disclosed herein in figure la or lb. 

A suitable open reading frame (ORF) comprises the ORF encoding the viral 
replicase (ORF la). When an overall amino acid identity of at least 60%, preferably of s 
least 70%, more preferably of at least 80%, more preferably of at least 90%, most 
preferably of at least 95% of the analysed replicase with the replicase having a sequenc 
comprising the amino acid fragments A, B, C, D, E, and/or F of figure 3 is found, the 
analysed virus isolate comprises an EMCR-CoV virus isolate according to the invention 

Another suitable open reading frame (ORF) useful in phylogenetic analyses 
comprises the ORP encoding the Nucleocapsid protein. When an overall amino acid 
identity of at least 60%, more preferably of at least 70%, more preferably of at least 80? 
more preferably of at least 90%, most preferably of at least 95% of the analysed 
Nucleocapsid protein with the Nucleocapsid protein encoded by a sequence comprising 
(part of) the sequence F of figure 3 is found, the analysed virus.isolate comprises an 
EMCR-CoV isolate according to the invention. 

Another suitable open reading frame (ORP) useful in phylogenetic analyses 
comprises the ORF encoding the Matrix protein. When an overall amino acid identity oi 
at least 60%, more preferably of at least 70%, more preferably of at least 80%, more 
preferably of at least 90%, most preferably of at least 95% of the analysed Matrix 
protein with the Matrix protein encoded by a sequence comprising (part of) the sequenc. 
P of figure 3 is found, the analysed virus isolate comprises an EMCR-CoV isolate 
according to the invention. 

Another suitable open reading frame (ORP) useful in phylogenetic analyses 
comprises the ORF encoding the spike protein S. When an overall amino acid identity oi 
at least 60%,- more preferably of at least 70%, more preferably of at least 80%, more" 
preferably of at least 90%, most preferably of at least 95% of the analysed S-protein 
encoded by a sequence comprising the sequence of translation 2 of E and translation 1 oi 
the F sequence of the S-protein as depicted in figure 3 is found, the analysed virus 
isolate comprises an EMCR-CoV virus isolate according to the invention. The S ORF of 
the EMCR-CoV virus seems to be located adjacent to the ORP lab (coding for the viral 
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replicase), which would discriminate an EMCR-CoV viruses from the bovine coronavirus 
and the murine hepatitis virus, which have a so-called 2a gene and an HE-gene between 
the S protein and the viral polymerase. 

The invention provides among others an isolated or recombinant nucleic acid or 
5 virus-specific functional fragment thereof obtainable from a virus according to the 

invention. The isolated or recombinant nucleic acids comprises the sequences as given in 
figure 3 or sequences of homologues which are able to hybridise with those under 
stringent conditions. In particular, the invention provides primers and/or probes 
suitable for identifying an EMCR-CoV virus nucleic acid. 
1 o Furthermore, the invention provides a vector comprising a nucleic acid according 

to the invention. To begin with, vectors such as plasmid vectors containing (parts of) the 
genome of the EMCR-CoV virus, virus vectors containing (parts of) the genome of the 
EMCR-CoV (for example, but not limited thereto, vaccinia virus, retroviruses, 
baculovirus), or EMCR-CoV virus containing (parts of) the genome of other viruses or 
15 other pathogens are provided. 

Also, the invention provides a host cell comprising a nucleic acid or a vector 
according to the invention. Plasmid or viral vectors containing the replicase components 
of EMCR-CoV virus are generated in prokaryotic cells for the expression of the 
components in relevant cell types (bacteria, insect cells, eukaryotic cells). Plasmid or 
2 0 viral vectors containing full-length or partial copies of the EMCR-CoV virus genome will 
be generated in prokaryotic cells for the expression of viral nucleic acids in-vitro or in- 
vivo. The latter vectors may contain other viral sequences for the generation of chimeric 
viruses or chimeric virus proteins, may lack parts of the viral genome for the generation 
of replication defective virus, and may contain mutations, deletions or insertions for the 

2 5 generation of attenuated viruses. 

Infectious copies of EMCR-CoV virus (being wild type, attenuated, replication- 
defective or chimeric) can be produced upon co-expression of the polymerase components 
according to the state-of-the-art technologies described above. 

In addition, eukaryotic cells, transiently or stably expressing one or more full- 

3 o length or partial EMCR-CoV virus proteins can be used. Such cells can be made by 

transfection (proteins or nucleic acid vectors), infection (viral vectors) or transduction 
(viral vectors) and may be useful for complementation of mentioned wild type, 
attenuated, rephcation-defective or chimeric viruses. 
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A chimeric virus may be of particular use for the generation of recombinant 
vaccines protecting against two or more viruses. For example, it can be envisaged that 
EMCRrCoV virus vector expressing one or more proteins of a human metapneumovirus 
or a human metapneumovirus vector expressing one or more proteins of EMCR-CoV 
virus will protect individuals vaccinated with such vector against both virus infections. 
Such a specific chimeric virus is particularly useful in the invention because it is 
suspected that co-infection of, for instance, human metapneumovirus frequently occurs 
in coronavirus infected patients. Attenuated and replication-defective viruses may be o1 
use for vaccination purposes with live vaccines as has been suggested for other viruses. 

In a preferred embodiment, the invention provides a proteinaceous molecule or 
coronavirus-specific viral protein or functional fragment thereof encoded by a nucleic 
acid according to the invention. Useful proteinaceous molecules are for example derivec 
from any of the genes or genomic fragments derivable from a virus according to the 
invention. Such molecules, or antigenic fragments thereof, as provided herein, are for 
example useful in diagnostic methods or kits and in pharmaceutical compositions such 
as sub-unit vaccines and inhibitory peptides. Particularly useful are the viral replicase 
protein, the spike protein, the matrix protein, the nucleocapsid or antigenic fragments 
thereof for inclusion as antigen or subunit immunogen, but inactivated whole virus can 
also be used. Particulary useful are also those proteinaceous substances that are 
encoded by recombinant nucleic acid fragments that are identified for phylogenetie 
analyses, of course preferred are those that are within the preferred bounds and metes 
of ORFs useful in phylogenetie analyses, in particular for eliciting EMCR-CoV virus 
specific antibodies, whether in vivo (e.g. for protective puposes or for providing 
diagnostic antibodies) or in vitro (e.g. by phage display technology or another technique 
useful for generating synthetic antibodies). 

Also provided herein are antibodies, be it natural polyclonal or monoclonal, or 
synthetic (e.g. (phage) Hbrary-derived binding molecules) antibodies that specifically 
react with an antigen comprising a proteinaceous molecule or EMCR-CoV virus-specific 
functional fragment thereof according to the invention. Such" antibodies are useful in a " 
method for identifying a viral isolate as an EMCR-CoV virus comprising reacting said 
viral isolate or a component thereof with an antibody as provided herein. This can for 
example be achieved by using purified or non-purified EMCR-CoV virus or parts thereof 
(proteins, peptides) using ELISA, RIA, PACS or similar formats of antigen detection 
assays (Current Protocols in Immunology). Alternatively, infected cells or cell cultures 
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may be used to identify viral antigens using classical immunofluorescence or 
immunohistochemical techniques- Specifically useful in this respect are antibodies 
raised against EMCR-CoV virus proteins which are encoded by a nucleotide sequence 
comprising one or more of the fragments disclosed in figure 3. 

Other methods for identifying a viral isolate as an EMCKrCoV virus comprise 
reacting said viral isolate or a component thereof with a virus specific nucleic acid 
according to the invention. 

In this way the invention provides a viral isolate identifiable with a method 
according to the invention as a mammalian virus taxonomically corresponding to a 
positive-sense single stranded RN A virus identifiable as likely belonging to the EMCR- 
CoV virus genus within the family of Coronaviruses. 

The method is useful in a method for virologically diagnosing an EMCR-CoV 
virus infection of a mammal, said method for example comprising determining in a 
sample of said mammal the presence of a viral isolate or component thereof by reacting 
said sample with a nucleic acid or an antibody according to the invention. 

Methods of the invention can in principle be performed by using any nucleic acid 
amplification method, such as the Polymerase Chain Reaction (PCR; Mullis 1987, U.S. 
Pat. No. 4,683,195, 4,683,202, en 4,800,159) or by using amplification reactions such as 
Ligase Chain Reaction (LCR; Barany 1991, Proc. Natl. Acad. ScL USA 88:189-193; EP 
Appl. No., 320,308), Self-Sustained Sequence Replication (3SR; Guatelli et al., 1990, 
Proc. Nati. Acad. Sci. USA 87:1874-1878), Strand Displacement Amplification (SDA; 
U.S. Pat. Nos. 5,270,184, en 5,455,166), Transcriptional Amplification System (TAS; 
Kwoh et al., Proc. Natl. Acad. Sci. USA 86:1173-1177), Q-Beta Replicase (lizardi et al., 
1988, Bio/Technology 6:1197), Rolling Circle Amplification (RCA; U.S. Pat. No. 
5,871,921), Nucleic Acid Sequence Based Amplification (NASBA), Cleavase Fragment 
Length Polymorphism (U.S. Pat. No. 5,719,028), Isothermal and Chimeric Primer- 
initiated Amplification of Nucleic Acid (ICAN), Ramification-extension Amplification 
Method (RAM; U.S. Pat. Nos. 5,719,028 and 5,942,391) or other suitable methods for 
amplification of nucleic acids. 

In order to amplify a nucleic acid with a small number of mismatches to one or more of 
the amplification primers, an amplification reaction may be performed under conditions 
of reduced stringency (e.g. a PCR amplification using an annealing temperature of 
38°C, or the presence of 3.5 mM MgC12). The person skilled in the art will be able to 
select conditions of suitable stringency. 
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The primers herein are selected to be "substantially" complementary (i.e. at lea 
65%, more preferably at least 80% perfectly complementary) to their target regions 
present on the different strands of each specific sequence to be amplified. It is possible 
to use primer sequences containing e.g. inositol residues or ambiguous bases or even 
primers that contain one or more mismatches when compared to the target sequence. I 
general, sequences that exhibit at least 65%, more preferably at least 80% homology 
with the target DNA or UNA oligonucleotide sequences, are considered suitable for use 
in a method of the present invention. Sequence mismatches are also not critical when 
using low stringency hybridization conditions. 

The detection of the amplification products can in principle be accomplished by 
any suitable method known in the art. The detection fragments may be directly stainec 
or labelled with radioactive labels, antibodies, luminescent dyes, fluorescent dyes, or 
enzyme reagents. Direct DNA stains include for example intercalating dyes such as 
acridine orange, ethidium bromide, ethidmm monoazide or Hoechst dyes. 
Alternatively, the DNA or UNA fragments may be detected by incorporation of labelled 
dNTP bases into the synthesized fragments. Detection labels which may be associated 
with nucleotide bases include e.g. fluorescein, cyanine dye or BrdUrd. 

When using a probe-based detection system, a suitable detection procedure for 
use in the present invention may for example comprise an enzyme immunoassay (EIA) 
format (Jacobs et al., 1997, J. Clin. Microbiol. 35, 791-795). For performing a detection 
by manner of the EIA procedure, either the forward or the reverse primer used in the 
amplification reaction may comprise a capturing group, such as a biotin group for 
immobilization of target DNA PCR amplicons on e.g. a streptavidin coated microliter 
plate wells for subsequent EIA detection of target DNA -amplicons (see below). The 
skilled person will understand that other groups for immobilization of target DNA PCR 
amplicons in an EIA format may be employed. 

Probes useful for the detection of the target DNA as disclosed herein preferably 
bind only to at least a part of the DNA sequence region as amplified by the DNA 
amplification procedure. Those of skill in the art can prepare suitable probes for 
detection based on the nucleotide sequence of the target DNA without undue 
experimentation as set out herein. Also the complementary nucleotide sequences, 
whether DNA or RNA or chemically synthesized analogs, of the target DNA may 
suitably be used as type-specific detection probes in a method of the invention, provided 
that such a complementary strand is amplified in the amplification reaction employed. 



14 



Suitable detection procedures for use herein may for example comprise 
immobilization of tbe amplicons and probing the DNA sequences thereof by e.g. 
southern blotting. Other formats may comprise an EIA format as described above. To 
facilitate the detection of binding, the specific amplicon detection probes may comprise a 

5 label moiety such as a fluorophore, a chromophore, an enzyme or a radio-label, so as to 
facilitate monitoring of binding of the probes to the reaction product of the amplification 
reaction. Such labels are well-known to those skilled in the art and include, for example, 
fluorescein isothiocyanate (FITC), p-galactosidase, horseradish peroxidase, streptavidin, 
biotin, digoxigenin, 35S or 1251. Other examples will be apparent to those skilled in the 

10 art. 

Detection may also be performed by a so called reverse line blot (RLB) assay, 
such as for instance described by Van den Brule et al. (2002, J. Clin. Microbiol. 40, 
779-787). For this purpose RLB probes are preferably synthesized with a 5' amino group 
for subsequent immobilization on e.g. carboxyl-coated nylon membranes. The advantage 
15 of an RLB format is the ease of the system and its speed, thus allowing for high 
throughput sample processing. 

The use of nucleic acid probes for the detection of RNA or DNA fragments is well 
known in the art. Mostly these procedure comprise the hybridization of the target 
nucleic acid with the probe followed by post-hybridization washings. Specificity is 
2 0 typically the function of post-hybridization washes, the critical factors being the ionic 
strength and temperature of the final wash solution. For nucleic acid hybrids, the Tm 
can be approximated from the equation of Meinkoth and Wahl, Anal. Biochem., 138: 
267-284 (1984): Tm = 81.5 °C + 16.6 (log M) + 0.41 (% GO-0.61 (% form)-500/L; where M 
is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine 

2 5 nucleotides in the nucleic acid, % form is the percentage of formamide in the 

hybridization solution, and L is the length of the hybrid in base pairs. The Tm is the 
temperature (under defined ionic strength and pH) at which 50% of a complementary 
target sequence hybridizes to a perfectly matched probe. Tm is reduced by about 1 °C foi 
each 1 % of nusmatching; thus, the hybridization and/or wash conditions can be 

3 o adjusted to hybridize to sequences of the desired identity. For example, if sequences 

with > 90% identity are sought, the Tm can be decreased 10°C. Generally, stringent 
conditions are selected to be about 5 °C lower than the thermal melting point (Tm) for 
the specific sequence and its complement at a defined ionic strength and pH. However, 
severely stringent conditions can utilize a hybridization and/or wash at 1,2,3, or 4 °C 
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lower than the thermal melting point (Tm); moderately stringent conditions can utilize 
hybridization and/or wash at 6, 7, 8, 9, or 10 °C lower than the thermal melting point 
(Tm); low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 1< 
15, or 20 °C lower than the thermal melting point (Tm). Using the equation, 
5 hybridization and wash compositions, and desired Tm, those of ordinary skill will 

understand that variations in the stringency of hybridization and/or wash solutions arc 
inherently described. If the desired degree of mismatching results in a Tm of less than 
45 °C (aqueous solution) or 32 °C (formamide solution) it is preferred to increase the 
SSC concentration so that a higher temperature can be used. An extensive guide to the 

1 0 hybridization of nucleic acids is found in Tijssen, Laboratory Techniques in Biochemist 
and Molecular Biology— Hybridization with Nucleic Acid Probes, Part I, Chapter 2" 
Overview of principles of hybridization and the strategy of nucleic acid probe assays", 
Elsevier. New York (1993); and Current Protocols in Molecular Biology, Chapter 2, 
Ausubel, et al., Eds., Greene Pubhshing and Wiley-Interscience, New York (1995). 

15 In another aspect, the invention provides ohgonucleotide probes for the generic 

detection of target UNA or DNA. The detection probes herein are selected to be 
"substantially" complementary to one of the strands of the double stranded nucleic acid 
generated by an amplification reaction of the invention. Preferably the probes are 
substantially complementary to the immobilizable, e.g. biotin labelled, antisense stranc 

20 of the amplicons generated from the target RNA or DNA. 

It is allowable for detection probes of the present invention to contain one or 
more mismatches to their target sequence. In general, sequences that exhibit at least 
65%, more preferably at least 80% homology with the target oligonucleotide sequences 
are considered suitable for use in a method of the present invention. 

2 5 Antibodies, both monoclonal and polyclonal, can also be used for detection 

purpose in the present invention, for example, in immunoassays in which they can be 
utilized in liquid phase or bound to a solid phase carrier. In addition, the monoclonal 
antibodies in these immunoassays can be detectably labeled in various ways. A variety 
of immunoassay formats may-be used to select antibodies specMcaUy reactive with a 

3 0 particular protein (or other analyte). For example, solid-phase ELISA immunoassays ar 

routinely used to select monoclonal antibodies specifically immunoreactive with a 
protein. See Harlow and Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor 
Publications, New York (1988), for a description of immunoassay formats and condition 
that can be used to determine selective binding. Examples of types of immunoassays 
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that can utilize antibodies of the invention are competitive and non-competitive 
immunoassays in either a direct or indirect format. Examples of such immunoassays are 
the radioimmunoassay (RIA) and the sandwich (immunometaric) assay. Detection of the 
antigens using the antibodies of the invention can be done utilizing immunoassays that 
5 are run in either the forward, reverse, or simultaneous modes, including 

immunohistochemical assays on physiological samples. Those of skill in the art will 
know, or can readily discern, other immunoassay formats without undue 
experimentation. 

Antibodies can be bound to many different carriers and used to detect the 
1 0 presence of the target molecules. Examples of well-known carriers include glass, 
polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and 
modified celluloses, polyacrylamides, agaroses and magnetite. The nature of the carrier 
can be either soluble or insoluble for purposes of the invention. Those skilled in the art 
will know of other suitable carriers for binding monoclonal antibodies, or will be able to 
1 5 ascertain such using routine experimentation. 

The invention also provides a method for serologically diagnosing an EMCR-CoV 
virus infection of a mammal comprising determining in a sample of said mammal the 
presence of an antibody specifically directed against an EMCR-CoV virus or component 
thereof by reacting said sample with a proteinaceous molecule or fragment thereof or an 
2 0 antigen according to the invention 

Methods and means provided herein are particularly useful in a diagnostic kit for 
diagnosing an EMCR-CoV virus infection, be it by virological or serological diagnosis. 
Such kits or assays may for example comprise a virus, a nucleic acid, a proteinaceous 
molecule or fragment thereof, an antigen and/or an antibody according to the invention. 

2 5 Use of a virus, a nucleic acid, a proteinaceous molecule or fragment thereof, an 

antigen and/or an antibody according to the invention is also provided for the production 
of a pharmaceutical composition, for example for the treatment or prevention of EMCR- 
CoV virus infections and/or for the treatment or prevention of atypical pneumonia, in 
particular in humans. Preferably a peptide comprising part of the amino acid sequence 

3 0 of the spike protein as depicted in the relevant translations of sequences E and F of 

figure 3, is used for the preparation of a therapeutic or prophylactic peptide. Also 
preferably, a protein comprising the amino acid sequence of the spike protein as 
depicted in the relevant translations of sequences E and F of figure 3, is used for the 
preparation of a sub-unit vaccine. Furthermore, the nucleocapsid of Coronaviruses, as 
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depicted in the translation of sequence F, in figure 3, is known to be particularly useful 
for eliciting cell-mediated immunity against Coronaviruses and can be used for the 
preparation of a sub-unit vaccine. 

Attenuation of the virus can be achieved by established methods developed for 
this purpose, including but not limited to the use of related viruses of other species, 
serial passages through laboratory animals or/and tissue/cell cultures, serial passages 
through cell cultures at temparutes below 37°C (cold-adaption), site directed 
mutagenesis of molecular clones and exchange of genes or gene fragments between 
related viruses. 

A pharmaceutical composition comprising a virus, a nucleic acid, a proteinaceoui 
molecule or fragment thereof, an antigen and/or an antibody accorcling to the invention 
can for example be used in a method for the treatment or prevention of an EMCR-CoV 
virus infection and/or a respiratory illness comprising providing an individual with a 
pharmaceutical composition according to the invention. This is most useful when said 
individual comprises a human. Antibodies against EMCR-CoV virus proteins, especial^ 
against the spike protein of EMCR-CoV virus, preferably against the amino acid 
sequence as depicted in translation 2 of sequence E and translation 1 of sequence F in 
figure 3, are also useful for prophylactic or therapeutic purposes, as passive vaccines. It 
is known from other coronaviruses that the spike protein is a very strong antigen and 
that antibodies against spike protein can be used in prophylactic and therapeutic 
vaccination. 

The invention also provides method to obtain an antiviral agent useful in the 
treatment of atypical pneumonia comprising establishing a cell culture or experimental 
animal comprising a virus according to the invention, treating said culture or animal 
with an candidate antiviral agent, and determining the effect of said agent on said virus 
or its infection of said culture or animal. An example of such an antiviral agent 
comprises an EMCR-CoV virus-neutralising antibody, or functional component thereof, 
as provided herein, but antiviral agents of other nature are obtained as well. 

The invention also provides use of an antiviral agent according to the invention 

for the preparation of a pharmaceutical composition, in particular for the preparation of 
a pharmaceutical composition for the treatment of atypical pneumonia, especifically 
when caused by an EMCR-CoV virus infection, and provides a pharmaceutical 
composition comprising an antiviral agent according to the invention, useful in a method 
for the treatment or prevention of an EMCR-CoV virus infection or atypical pneumonia, 
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said method comprising providing an individual with such a pharmaceutical 
composition. 

The invention also comprises an animal model usahle for testing of prophylactic 
and/or therapeutic methods and/or preparations. It is hypothesized that apes can be 
5 infected with the EMCErCoV virus, thereby showing clinical symptoms, and more 
importantly, similar tissue morphology as found in humans suffering from atypical 
pneumonia caused by the EMCRrCoV virus. Subjecting apes to a prophylactic or 
therapeutic treatment either before or during infection with the virus will have a good 
and useful predictionary value for application of such a prophylaxis or therapy in 
10 human subjects. 

The invention is further explained in the Examples without limiting it thereto. 
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Figure legends 

Fig. 1: Phylogenetic relationship for the nucleotide sequences of isolate EMCR-CoV witt 
its closest relatives genetically. Phylogenetic trees were generated by maximum 
5 likelihood analyses using 100 bootstraps and 3 jumbles. The scale representing the 
number of nucleotide changes is shown for each tree. Figure la. Maximum likelihood 
tree of matrix gene nucleotide sequences. Numbers in trees represent bootstrap values. 
The scale bar roughly reflects 10 % nucleotide differences between related sequences. 
Figure lb. Maximum likelihood tree of nucleocapsid gene nucleotide sequences. 
1 0 Numbers in trees represent bootstrap values. The scale bar roughly reflects 10 % 
nucleotide differences between related sequences. 

Fig. 2: Similarity matrix indicating the nucleotide and amino acid identity for the 
putative Matrix protein (2a and 2b resp.) and for the putavive Nucleoprotein (2c and 2d 
15 resp.) between the EMCR-CoV virus and closely related coronaviruses. See text for 
abbreviations. 

Fig. 3: Nucleotide sequences from parts of the EMCR-CoV virus. Also included are the 
putative polypeptide sequences of polypeptides and alignments of the putative 
2 0 polypeptides with that of another member of the Coronoviridae family, where possible 
(mostly HCoV-229E). 
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Examples 
Specimen collection 

Virus was collected from an 8 month old patient suffering from pneumonia using nasal 
5 swabs. 

Virus isolation and culture 

Throat swabs were dipped into a culture of tMK cells and passaged four times. Virus 
was then in Vero-118 cells. One litre of virus containing cell culture supernatant was 
10 harvested, and the virus was pelleted in an ultracentrifuge and the virus pellet was 
resuspended inlml PBS. 

RNA isolation 

RNA was isolated from the supernatant of infected cell cultures or sucrose gradient 
1 5 fractions using a High Pure RNA Isolation kit according to instructions from the 
manufacturer (Roche Diagnostics, Almere, The Netherlands). 

Sequencing 

Purified RNA was sent to RaseClear holding BV (Leiden, The Netherlands) for 
2 0 sequencing. 

Phvlosenetic analyses 

Nucleotide sequences were aligned using Clustal W running under BioEdit version 
5.0.9. Maximum likelihood trees were created using the Seqboot and DNA-ML packages 

2 5 of Phylip 5.6 using 100 bootstraps and 3 jumbles. The consensus trees were calculated 

using the Consense package of phylip 5.6. These consensus trees were used as usertree 
in DNA-ML to recalculate the branch lengths from the original sequences. 

The sequences of BMCR-CoV were compared with those of reference viruses 

3 0 representing each species in the four groups of coronaviruses. These were: human 

coronavirus 229E (229E), af304460; porcine epidemic diarrhea virus (PEDV) af353511; 
transmissible gastroenteritis virus (TGEV), aj271965; bovine coronavirus (BoCoV), 
af220295; murine hepatitis virus (MHV), af201929; avian infectious bronchitis virus 
(AIBV), m95169, Canine coronavirus (CaCoV), dl3096; feline coronavirus (FeCoV), 
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ay204704; porcine respiratory coronavirus (PRCoV), z24675; human coronavirus OC43 
(OC43), m7637S, 114643, m933990; porcine haemagglutinating encephalomyelitis virus 
(HEV), ay078417; rat coronavirus (RtCoV) af 207551) References for the viruses are thf 
numbers of the NCBI catalog (http^/www.ncbi.nlm.nih.gov/entrez/). 

5 

In general, coronaviruses, such as EMCR-CoV can be isolated and identified according 
to the following protocol: 
Specimen collection 

In order to find virus isolates nasopharyngeal aspirates, throat and nasal swabs, 
10 broncheo alveolar lavages, serum and plasma samples, and stools preferably from 
mammals such as humans, carnivores (dogs, cats, mustellits, seals etc.), horses, 
ruminants (cattle, sheep, goats etc.), pigs, rabbits, birds (poultry, ostriches, etc) should 
be examined. Prom birds cloaca swabs and droppings can be examined as well. Sera 
should be collected for immunological assays, such as ELISA, molecular-based assays, 
1 5 such as RT-PCR and virus neutralisation assays. 

Collected virus specimens may be diluted with 5 ml Dulbecco MEM medium 
(BioWhittaker, WalkersviUe, MD) and thoroughly mixed on a vortex mixer for one 
minute. The suspension is thus centrifuged for ten minutes at 840 x g. The sediment is 
spread on a multispot slide (Nutacon, Leimuiden, The Netherlands) for 
2 0 immunofluorescence techniques, and the supernatant is used for virus isolation. 

Virus isolation 

For virus isolation Vero-118 cells or tMK cells (RIVM, Bilthoven, The Netherlands) wer- 
cultured in 24 well plates containing glass slides (Costar, Cambridge, UK), with the 
medium described below supplemented with 10% fetal bovine serum (BioWhittaker, 
Vervier, Belgium). Before inoculation the plates were washed with PBS and supplied 
with Eagle's MEM with Hanks' salt (ICN, Costa mesa, CA) supplemented with 0.52/lite: 
gram NaHCOs , 0.025 M Hepes (Biowhittaker), 2 mM I^glutamine (Biowhittaker), 200 
units/liter penidlline, 200 pg/liter stieptbmycme (BiowMtiaker), IgramTliter 
lactalbumine (Sigma-Aldrich, Zwijndrecht, The Netherlands), 2.0 gram/liter D-glucose 
(Merck, Amsterdam, The Netherlands), 10 gram/liter peptone (Oxoid, Haarlem, The 
Netherlands) and 0.02% trypsine (Life Technologies, Bethesda, MD). The plates were 
inoculated with supernatant of the patient samples, 0,2 ml per well in triplicate, 
followed by centrifuging at 840xg for one hour. After inoculation the plates were 
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incubated at 37 °C for 1-7 days and cultures were checked daily for CPE. Extensive CPE 
was generally observed within 5-10 and included detachment of cells from the 
monolayer.. 

Virus culture 

Sub-confluent monolayers of tMK cells or Vero clone 118 cells in media as described 
above were inoculated with supernatants of samples that displayed CPE or with 
samples taken from a patient. 

RNA isolation 

RNA was isolated from the supernatant of infected cell cultures or sucrose gradient 
fractions using a High Pure RNA Isolation kit according to instructions from the 
manufacturer (Roche Diagnostics, Ahnere, The Netherlands). RNA can also be isolated 
following other procedures known in the field (Current Protocols in Molecular Biology). 

Sequence analysis 

Sequence analyses were performed as follows: Purified viral RNA (500ng) was converted 
to cDNA using the Superscript Choice system (Invitrogen Corp., Carlsbad, CA) by 
random priming accor din g to the manufacturer's instructions. Blunt-ended, 
doublestranded cDNA fragments were size-selected on agarose gel to include fragments 
ranging from 750bp to 4kb. Following purification by spin column (Zymo Research, 
Orange, CA), cDNA fragments were ligated into pSMART-HCAmp (Lucigen Corp., 
Middleton, WI). The resulting library was electroporated into DH10B ElectroMAX cells 
(Invitrogen Corp., Carlsbad, CA), and inserts were amplified from individual colonies 
using pSMART AmpLl and AmpRl primers. PGR fragments were sequenced using 
BigDye 3.1 chemistry and run on a ABI3730 machine (Applied Biosystems, Poster City, 
CA). 
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tl 8 .11 2003 

Claims 



1. An isolated essentially mammalian positive-sense single stranded UNA virus 
(EMCR-CoV) comprising one or more of the sequences of figure 3. 

5 

2. An isolated positive-sense single stranded RNA virus (EMCR-CoV) belonging to 
the Coronaviruses and identifiable as phylogenetically corresponding thereto by 
determining a nucleic acid sequence of said virus and testing it in phylogenetic tree 
analyses wherein maximum likelihood trees are generated using 100 bootstraps and 3 

10 jumbles and finding it to be more closely phylogenetically corresponding to a virus 

isolate having the sequences as depicted in figure 3 than it is corresponding to a virus 
isolate of PEDV (porcine epidemic diarrhea virus), HCoV-229E (human coronavirus 
229E), PRCoV (porcine respiratory coronavirus), TGEV (transmissible gastroenteritis 
virus), CaCoV (Canine coronavirus) and PeCoV (feline coronavirus). 

15 

3. A virus according to claim 1 or 2 wherein said nucleic acid sequence comprises a 
open reading frame (ORF) encoding a viral protein of said virus. 

4. A virus according to claim 3 wherein said open reading frame is selected from tl 
2 0 group of ORFs encoding the viral replicase, nucleocapsid protein, matrix protein or the 

spike protein. 

5. A virus according to claim 1-4 isolatable from a human with respiratory tract 
disease such as, but not limited to, atypical pneumonia. 



6. An isolated or recombinant nucleic acid or EMCR-CoV virus-specific functional 
fragment thereof obtainable from a virus according to anyone of claims 1 to 5. 

7- A vector comprising a" nucleic acid according to claim 6. 

8. A host cell comprising a nucleic acid according to claim 6 or a vector according to 
claim 7. 
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9. An isolated or recombinant proteinaceous molecule or EMCR-CoV virus-specific 
functional fragment thereof encoded by a nucleic acid according to claim 6. 

10. An antigen comprising a proteinaceous molecule or EMCR-CoV virus-specific 
functional fragment thereof according to claim 9. 

11. An antibody specifically directed against an antigen according to claim 10. 

12. A method for identifying a viral isolate as an EMCR-CoV virus comprising 
reacting said viral isolate or a component thereof with an antibody according to claim 
11. 

13. A method for identifying a viral isolate as an EMCR-CoV virus comprising 
reacting said viral isolate or a component thereof with a nucleic acid according to claim 
6. 

14. A method for virologically diagnosing an EMCR-CoV infection of a mammal 
comprising determining in a sample of said mammal the presence of a viral isolate or 
component thereof by reacting said sample with a nucleic acid according to claim 6 or an 
antibody according to claim 11. 

15. A method for serologically diagnosing an EMCR-CoV infection of a mammal 
comprising determining in a sample of said mammal the presence of an antibody 
specifically directed against an EMCR-CoV virus or component thereof by reacting said 
sample with a proteinaceous molecule or fragment thereof according to claim 9 or an 
antigen according to claim 10. 

16. A diagnostic kit for diagnosing an EMCR-CoV infection comprising a virus 
according to anyone of claims 1 to 5, a nucleic acid according to claim 6, a proteinaceous 
molecule or fragment thereof according to claim 9, an antigen according to claim 10 
and/or an antibody according to claim 11. 

17. Use of a virus according to any one claims 1 to 5, a nucleic acid according to claim 
6, a vector according to claim 7, a host cell according to claim 8, a proteinaceous 
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molecule or fragment thereof according to claim 9, an antigen according to claim 10, 
an antibody according to claim 11 for the production of a pharmaceutical composition. 



or 



18. Use according to claim 17 for the production of a pharmaceutical composition fin 
5 the treatment or prevention of an EMCR-CoV virus infection. 

19. Use according to claim 17 or 18 for the production of a pharmaceutical 
composition for the treatment or prevention of atypical pneumonia. 

10 20. A pharmaceutical composition comprising a virus according to any one of claims 
1 to 5, a nucleic acid according to claim 6, a vector according to claim 7, a host cell 
according to claim 8, a proteinaceous molecule or fragment thereof according to claim 9 
an antigen according to claim 10, or an antibody according to claim 11. 

15 21. A method for the treatment or prevention of an EMCR-CoV virus infection 
comprising providing an individual with a pharmaceutical composition according to 
claim 20. 

22. A method for the treatment or prevention of atypical pneumonia comprising 

2 0 providing an individual with a pharmaceutical composition according to claim 20. 

23. A viral replicase encoded by an RNA sequence comprising the sequences A, B, C, 
D, B and/or F, or homologues thereof as depicted in figure 3. 

25 24. A viral spike protein comprising the amino acid sequence depicted as a 
translation of (part of) sequences E and F as depicted in figure 3, or a homologue 
thereof. 

25 A ^al nucleocapsid encoded by an UNA sequence comprising a translation of 

3 0 (part of) the sequence F as depicted in figure 3 or a homologue thereof. 

26. A viral nsp 3 or envelope protein encoded by an RNA sequence comprising a 
translation of (part of) the sequence F as depicted in figure 3. 
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27. A nucleic acid sequence which comprises one or more of the sequences A to F as 
depicted in figure 3 or a nucleic acid sequence which can hybridise with any of these 
sequences under stringent conditions. 



EPO-DG1 
i' 8 .11. 2003 

Abstract 




The invention relates to the field of virology. The invention provides a new 
isolated essentially mammalian positive-sense single stranded KNA virus 
(EMCR-CoV) within the group of coronaviuses and components thereof. 
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Figvire3 

RNA sequences, implied polypeptides and alignment with one close relative 
1. Sequence A 

CCGCTGCACAAGGTTTTCAGGCATGCCGCTTTGTTGCTTTTGGCTTACAG 

acgSSg^S 

ATraGCGAGGTTGGCTCATTTTTTCTAAGAGCAATTATGTO 

CAGGAAGTGTGGTTTTTGTGGATAAGTATATGTGTGGTTTTGATGGTAAACCTGTGTTACCTAAAAACATGTGGG 
AATTTAGAGATTACrTTAATGATAATACTGATAGTATTGTTATTGGTGGTGTCACTTATCAATTAGCATGGGATG 

SacISaaag^ 

ctcatactt^gtctggttgcaaactcattaatgccaagccgcctaa^^ 
gtgaatggaatgctgtgtataaggcgtttggtt^ 

ttaaaccagttttctttaatgcttttgttaaatggaattgtggttctgagaattggagtgttggtgca^ 

GTTATCTATCTTCTTGTTGTGGCaCACOTGCTAAGAAACTTTGTGTTGTTCCTGGTAA 
TGATCATCACCTCAACTGATGCTGGTTGTGGTGTTAAATACT^^ 

TTACTGGTGTGTCTTTATGGCGTGTTAC^GCTGTTCATTCTGATGGAATGTTTGTGGCTVACATCTTCTTA^TG 
CACTTTTGCATAGAAATTCATTAGACCCTTTTTGCTTTGATGTTAAC7VCTTTACTTTCTAATCAATTACGTCTAG 

CTTTTCTT0K3TCCTTCTGTTAC^^ 

TGTTTGGTOTTTACGATGACATATTGACAAACAATAAACCTTGGTTTGTACGCAAAGCTTCTGGGCU"rx^ 

CAATCTGGGATGCTTTTGTTGCCGCTATTAAGCTTGTGCCAACTACTACTG 

CTATCGCITCAACTGTTTTAACTGTTTCTAATGGT^^ 

CAGTTTACCGCAC^TTTACACAAGCTATTTGTGCTGC^TTTGATrTTTCTTTAGATGTATTO 

TTAAATTTAAACGACTTGGTGATTATGTTCTTACTGAAAATGCTCTTGTTCGTTTGACTACTGAAGTTGTTCGTG 
GTGTTCGTGATGCTCGGATAAAGAAAGCCATGTTTACTAAAGTAGTTGTAGGTCCTACAACTGAAGTTAAGTTTT 
CTGTTATTGAACTTGCCACTGTTAA.TTTGCGTCTTGTTGATTGTGCACCTGTAGTTTGCCGTAAAGGTAAAATTG 
TTGTTATTGCTGGACAAGCTTTTTTCTATAGTGGTGGTTTTTATCGTTTTATGGTTGATTCTACAACTGTATTAA 

ATGACCCTGTTTTTACTGGTGAGTTATTTT^^ 
AGTTTGTTAATGCTAGTTCTGCTACAGATGCCATTATTGCrTO 

TTTTTGTGSiCACATGTGTGGTTGATGGTTGTAGTGTCATTGTTAGACGTGATGCTACATTCGCCACACATGTGT 
GTTTTAAGGACTGTTATAGTATTTGGGAGCAATTCTGCATT^ 

ATAATGCTATCTTGCAGAGTAATAACCCTCAATGTGCTATTGTTCAAGCATCGGAGTCTAAAGTTTTGCTTGAGA 
GGTTTTTACCTAAGTGTCCTGAAGTACTGTTGAGTATTGATGATGGCCATTTATGGAATCTTTTTGTTGAAAAGT 
TTAATTTTGTTACAGATTGGTTAAAAACTCTTAAGCTTACACTTACTTCTO 

AACGTTTTAGACGTGTTTTGGTAAAATTGCTTGATGTCTATAATGGTTTTCITGAAACTGTCTGTAGTGTCGTAC 
ACACTGCTGGTGTTTGC^TTAAATATTATGCTGTTAATGTTCCATATGTAGTTATTAGTGGTTTTGTAAGTCGTG 
TAATTGGTAGAGAAAGGTGTGACGTGACTTTTCCTTGTGTTAGTTGTGTCACTTTTTTCTATGAATTTTTAGACA 
CGTGTTTTGGTGTTAGTAAA.CCTAATGCCATTGATGTTGAA.C^TTTAGAGCTTAAAGAAACTGTTTTTGTTGAAC 
CTAAGGATGGTGGTCAATTTTTTGTTTCTGATGATTATCTTTGGTATGTTGTAGATGACATTTATTATCCAGCTT 
C^TGTAATGGTGTATTGCCAGTTGCTTTTACAAAATTGGCAGGTGGTAAAATATCTTTTTCTGATGATGTTATAG 
TTGATGATGTTGAACCTACCCATAAAGTCAAGCTCATATTTGAGTTTGAAGATGATGTTGTTACCAGTCTTTGTA 
AGAAGAGTTTTGGTAAGTCTATTATTTATACAGGTGATTGGGAAGGTTTACATGAAGTTCTTACATCTGCAATGA 
ATGTCATTGGGCAACATATTAAGTTGCC^CAATTTTATATTTATGATGAAGAGGGTGGTTATGATGTTTCTAAAC 
GAGTTATGATTTCACAATGGCCTATTAGTGATGATAGTGATGGTTGTGTTGTTGAAGCGAGCACTGATTTTCATC 
AATTAGAATCTGCTAGAGAAGAGGTTGATATAATTOAACAACCTTTTGGGGAAGTTGAAC^ 

GACAACCTTTTTCTTTTTCTTTTAGAGATGAATTGGGTGTTCGTGTTTTAGATCAATGTGATAATAATTGTTGGA 
TTAGTACCACACTTATACAGTTGCAACTTACAAAGCTTTTGGATGATTCTATTGAGATGGAATTGTTTAAAGTTG 
GTAAAGTTGATTCAATTGTTGAA^GTGTTATGAGTTGTCTCATTTAATTAGTGGTTCACTTGGTGATAGTGGTA 

AACTTCTTAGTGAACTTCTTAAAGATAAATATACATGTO 
CAGCTGTTCTCG 

Putative ORFa 

>~out: 140 to 310: Frame 2 57 aa 

ASVVFSWKHWWCPLVHTIJI^ 
>-out: 267 to 3761: Frame 3 1165 aa 



i 



LLTMFYNQVTLAraSDSEISGFGFAIPSVAVRAYSEA^^ 

AKI LLFSDRPLNLRGWLI PSNSNYVLQDFDVVFGHGAGSVVFVDK^CGFDGKPVIiPKNMWEFRDYFITONTDS t 

I GGVTYQLAWD VI RKDLS YEQ QNVLAI ES I HYLGTTGHTLKSGCKL INAKP PKYS S KVVL SGE WNAVYKAFGS P 
ITNGISLLDIIVTCPVFFNAFVTCCNCGSEl^ 

YAGLVVICHITNITGVSLWRVTAVHSDGMFVATSSYDALLHRNSLDPFCTDVN^ 

ASTGVID I SAGMFGLYDDILTNNKPWFVRKASGLFDAI WDAF VAAI KLVPTTTGGLVRFVKS I ASTVLTVSNGV 
IMCADVPDAFQPVYRTFTQAI CAAFDFSLDVFKIGDVKFKRLGD YVLTENALVRIjTTEVVRGVRDARI KKAMFT 
VWGPTTEVKFS VI ELATVNLRLVD CAPVVCPKGKI VVI AGQAFFYSGGFYRFMVDSTTVLNDPVFTGEIjFYTI 
FSGFKLDGFNHQFVNASSATDAI I AVELLLSDFKTAVFVYTCVVDGCS VI VRRDATFATHVCFKDCYS I WEOFC 
DNCGEPWFLTDYNAILQSNNPQCAIVQASESKVLLERFLPKCPEVL^ 

LTSNGLLGNCAKRFRRVLVKLLDVYNGFLBTVCS VVHTAGVC I KYYAVNVP YWI SGFVSRVIRRERCD VTFPC 
SCVTFFYEFLDTCFGVSKPNAIDVEHLELKETVFVEPKDGGQFFVSDDYLWYVVDDIYY^ 

GGKI SFSDDVI VHDVE PTHKVKLI FEFEDD WTSLCKKSFGKS 1 1 YTGDWEGLHEVLTSAMNVIGQHIKLPOFY 

R VTiDQSDNNCWI STTLIQLQLTKLLDDS I EMQLFKVGKVDS IVQKC YELSHLI SGSLGDSGKLLSELLKDKYTC 

ITFEMSCDCGKKPDEQVGCLFWIMPYTKIiFKKVRTNSAVL 

>~out: 472 to 738: Frame 1 89 aa 

LVLI SF VPKFYFFIiI DLL I CEVGS FFLTAIMFFRTLMLFLAMVQEVWFLWI S I CVVLMVTOLC YLKTCGXTLE ITL 

I I LI VLLL WSL I N 

>~out: 973 to 1125: Frame 1 51 aa 

LLNQFSL^LLNAIVVLRIGVLVHGMVIYLLV^ 

>-OUt: 2026 to 2316: Frame 1 97 aa 

^wJwGSM^ 

Alignment 

>gi 1 28*286 |pir|| £28600 hypothetical protein la - human coronavirus 
ei 1 59491 1 emb t CAA49877.1 1 ORFla [Human coronavirus 229E1 
Lengths 4085 

Score= 882 bits (2280), Expect = 0.0 

Identities = 470/1159 (40%), Positives - 675/1159 (58%), Gaps « 7/1159 (0%) 
Frame = +3 



Query: 276 MFYNQVTLAVASDSE I SGFGFAI PS VA VT^YSEAAAQGFQACRFVAFGLQDCVTGINDDD 455 

M N+ VTLAVASDS E I S G + + AVR YSEAA+ GF*ACRFV + LQDC+GI JDD 
Sbjct: 1 ^CNRVTIAVASDSEISANGCSTIAQAVRRYSK^SN^^ 60 

Query: 456 YVIALTGTNQLCAKSLLFSDRPI^ 632 

YV+ ^ G L 1+ FSDRP L GWL+FSNSNY +L++FDWPG G G+V + D+Y+ 
Sb;jct: 61 YVWJmGNQTLFCNraKFSD 120 

Query: 633 CGFDGKPVLPKNMWEFRDYFNDNTDSIVIGGVTY 812 

CG DGKPV+ +++ W+F D+F +N + I+I G TY AW RK L yXSe I Y 
Sb 3 Ct: 121 CGADGKPVMSEDLWQFVDHFGEN"- EE 1 1 INGHTYV(^WLTKRKPI^ YKRQNNLAIEE IE Y 179 

Query: 813 L-GTTGHTLK£GCKLINAKPPKYSSKVVI^ 989 

+ G HTL++G L AK K SSKWLS + +YK FGSP +TNG ++L+ KPVF 
Sbjct: 180 VHGDALHTLRNGSVLEMAKEVKTSSK^^ 239 

Query: 990 PNAFVKCNCGSENWSVGAWDGYLS SCCGTPAKKLC\A^GNVVPGDV^ I TSTDAGCGVKYY 1169 

o„„ +A V+C CG+++WS VG W G+ SSCC + KLCWPGNV PGD AG G+KYf 

Sb^Ct: 240 ISALVQCTCGTKSWSVGDWTGFKSSCC^ 299 

Query: 1170 AGLVvlCHITNITGVSLWRvTTAyHSDGMFv 1349 

. -,- - .. G+.+K.+ .NI OVS.+WRV A+ S__ -FVA+S+* H- N- +D -FCF+V- XT+t RL 

Sb 3 Ct: 300 CGMTLKFVANIEGVSWRVIALQSVDCFVASSTFVEEEHVTOIM^ 3 59 

Query: 1350 AFIXSASVTEDVTOAASTGTODISAGMFGLYJ^ 152 q 

A LGA +T +V+ ++GVIDIS G F +YDDI +KFWFVRKA +F W A +A+ 
Sbjct: 360 AMLGAEMT SNVRRQVAS GVIDI STGWFDVYDD I FAESKPWFVRKAEDI FGPCWSALASAL 419 

Query: 1530 KLVPTTTGGLVRFVKS IASTVLTVSNGVI IMCADVPDAFQPVYRTFTQAI CAAFDFSIiDV 1709 

K + TTG LVRFVKSI ++ + V G I + A VP+ F + F AI PD +++ 
SbDCt: 420 KQLKVTTGELVRFVKS I CNSAVAVVGGTIQ ILAS VPEKFLNAFDVFVTAI QTVFDCAVET 479 

Query: 1710 FKI GDVKFKRLGDYVLTENALVRLTTEVVRGVRDARI KKAMFTKWVGPTTEVKFS VI EL 1889 

I F ++ DYVL +NALV+L T ++GVR+ + K + VWG T EVK S +E 

SbDCt: 480 CTIAGKAFDKVFDYVLLDNALVKLVTTKLKGVRE 539 



Query: 1890 ATVNLRLVDCAPWCPKGKIWIAGQAFFYSGGFYRF^ 2069 

+T L + + + +G WI A+P S G++R M +VL V+ + + 

Sbjct: 540 STAVLTIANNYSKLFDEGYTVVIGDVAYFVSDGYFRLMASPNSVLTTAV^ 599 

Query: 2070 SGPKLDGPNHQPVNASSATDAIIAVBIiLLSDPKTAVFVYTCVVDGCSVIVRRDATPATH^ 2249 

G + + F V + A++ V +++P+ Y+ V +IV+ + + + 

SbjCt: 600 MGTRPEKF - PTTVTCENLES AVLFVNDKI TEFQ IiDYSIDVIDNEIIVKPNISIiCVPIi 655 

Query: 2250 CFKDCYSIWEQFCiDNCGEFWFLTDYNAIL^ 2429 

+D W+ PC E WF DY A + + A V+A+ESK ++ +P CP +L 

SbjCt: 656 YVRDYVDKWDDFCRQYSNE S WFEDD YRAFI SVIJD I TDAAVKAAES KAFVDT I VP P CPS I L 715 

Query: 2430 LS IDDGHLWNLFVEKFNFVTDWXXXXXXXXXXXXXXXXCAKRFRW 2609 

ID G +WN ++ N VDW CAKRF+R Ii LL+ YN FIi+T 

SbjCt: 716 KVIDGGKIWNGVIKNWSVRDWLKSIiKLNL 775 

Query: 2610 VCS WHTAGVCI KYYAVNVPYWI SGFVSRVTRRERCD- - VTFPCVSCVTFFYEFLDTCF 2783 

V S V G+ K YA + PY+VI V +V + + PP + F F 
SbjCt: 776 WS0^lGGI*TPKTYAPDKPYIVIimiVCKVENKTEAEWIEIiFPH3^RIKSPSTFESAYM 835 

Query: 2784 GVSKPNAIDVEHLELKETVFVEPKDGGQPPVSDDYIiWYVVDDIYYPASCNGVLPVAF 2963 

++ p d+E +EI* + FVEP GG V D++++Y D +YYP++ +LPVAFTK 
Sb j C t : 836 P IADPTHFD I EE VELLDAEFVE PGCGGI IoAVIDEHVF YKKDGVYYPSNGTNI LP VAFTKA 8 95 

Query: 2964 AGGKISFSDDVI VHDVEPTHKVKLI FEFEDDVVTSLCKKSFGKS 1 1 YTGDWEGLHEVLTS 3143 

AGGK+SPSDDV V D+EP ++VKL FEFED+ + +C+K+ GK I + GDW+ + + S 
SbjCt: 896 AGGKVSPSDDVEVKDIEPVYRVKLCFEFEDEKLVDVCEKAIGKKIKHEGDWDSPCKTIQS 955 

Query: 3144 AMNVTGQHI KLPQFYI YDEEGGYDVS KPVMI SQWP IS DDSDGCWEASTDPHQLESV 3314 

A++V+ ++ LP +YIYDEEGG D+S PVMIS+WP+S + + + + D ++ V 

SbjCt: 956 ALSWSCYVNIiPTYYIYDEEGGNDLSLPVMISEWPLSVQQAQQEATLPDIAEDV — VDQV 1013 

Query: 3315 REEVDIIEQPPGEVEHA^lSIRQPFSFSFRDELGVRVIJDQSD^^CWISXXXXXXXXXXXOT 3494 

E I + +V+H +S PP P + G+++I* Q DNNCW++ D 
Sbjct: 1014 EEVNS IFDI ETVDVKHDVS PPEMPFEELNGI*KIIjKQLDNNCWVNSVMLQI QLTGILD 1070 

Query: 3495 DSIEMQLFKVGKOTSIVQKCYELSHM^ 3674 

MQ FK+G+V ++++CY I +T + + C C 

Sbjct: 1071 GDYAMQFFKMGRVAKMIERCYTAEQCIRGAMGDVGL 1130 . 

Query: 3675 KFDEQVGCLFWIMPYTKLP 3731 

E+ G + + P K P 
SbjCt: 1131 GRLEESGAVLPCTPTKKAF 1149 



2> Sequence B 

1610 nucleotides encodes part of replicase 
TTTCTGCCTATGGAGGTC^GGTATGAT^ 
TATTCCACTTATGCCTCTTCTTAGTTGTGGTA 

TTGTGACATTAATAAACC2ATTGCAAGTTTTTGTTTATTCTTCAAATGAA 
TGGTTTAGATTTAACACGAGTCATTGACGATG^^ 
CTTTGATTGTGGTGTCAATGCCTTGGATCXS^ 
ACAAGGAGAATTATTGGACACAAAACTTAATGGTATT^ 

AACTXSTACCAGCTGGTAATTTGGTTAAACTTGTTGTTGAGAGTTGTACCATTTAT^ 

AAATGATCTTTCTTTTGATAAAAATCTTGG 

CAATGTTCCTGCTATTGATGTTTO^ 

TAATGTTATGGATGTTAACGACTGTT^ 

TAAAGATGTTGTTGTTGAGTCTTCTAAGTCACTT 

TGAAGGTGTTTTACCTATTAATACT^^ 

TTTTGAAAAGGCAGCACTVTTTTGCTTCT^ 

TAGAGTTCTTGGGACCACCGACAATAATTGTTGGGTTAATGCAACTTGTATAATOT 

TTTTAAATCTAAGGGTTTAAATGTTCTTTGGAACAAATTTGTTACAGGTGAT 

TTATTTTATAACTATGTCTTC^^ 

GATTAGTGATTCTATTGTTACTCTTGAAC^^ 

AAGTGCTGTTGTCTGTGCTAGTGTGCTTAAAGATGGTTC 

TTCACGTGTTAAGTTTGTTAATGGACGTGTTGTTATTACCAATGTTGGTGAACCTATAATTTC^ 

GTTGCTTAATGGTATTGCTTATACAACATTTTCAGGTTCTTTTGATAACGGTCACTATGTAGTTTATGATGCTG 

TAATAATGCTGTCTATGATGGTGCTC^^ 

AGTAGGTGGTTGTGTAACATCTAATTTCCACAACG 



35 



40 



50 



55 



Putative ORfs 



>~out: 32 to 1609: Frame 2 626 aa 
MVSIERYLENSSENGIPIMPLLSCGIPGVRIENSLKALFSaJINKPLQVFVySSNEEQAVLKFLDC 




DraPYGYPNDFVGGFRVLGT^^ 
10 N TNVGE PI I SQPSKIiLNGI AYTTFSGS FDNGHYVVYDAAIMAVYDGARLFASDL^ 

>~out: 366 to 524: Frame 3 53 aa 
CWINKDNYWTQNLMW 

Alignment 

15 >aUamZgil^iME 073549.1 1 replicasepoIyproWlab[H^ 

^|301799g7| g plQQ5OO2|R3AB CVH3S Replicase polyprotein lab (pplab) (ORFlab polyprotein) (Includes. 
Replicase polyprotein la (ppla) (ORFla)] [Contains: p9; uncmaes. 
p87; p!95 (Papain-like proteinases 1/2) 
(PL1-PRO/PL2-PRO); Peptide HD2; 3C-like proteinase 

2 0 (3CL-PRO) (SCLp) (M-PRO) (p34); Unknown protein 1; p5; 

p23; pl2; Growth factor-like peptide (GFL) (pl6); 
RNA-directed RNA polymerase (RdRp) (Pol) (plOO); Helicase 
(Hel) (p66) (p66-HEL); Unknown protein 2; p41; Unknown 
. protein 33 

25 gi 1 12Q82740 | gb | AAG4859U | repUcase polyprotein lab DHCuman coronavirus 229EI 
Length = 6758 

Score = 429 bits (1104), Expect - e-119 

Identities « 233/585 (43%), Positives = 323/535 (60%), Gaps » 18/535 (3%) 

3 0 Frame = +2 



Query: 41 IER^ENSS™GIPLMPLLSCGIPGVRIENSLKALPSCDINKPLQVFVirSSNEEQAVl,KF 220 

, I+ Y ++B G PL P+LSCGIFG+++E SL+ L K ++VFVY+ E v w ' 

Sbjct: 1372 IKRYNTII^QGTPLTPII.SCGIPGIKLETSl^VLLDVC^KEVKVPVyTDTEVCKV^ 1431 

Query: 221 LDGLDLTPVIDD VDWK PFRVEGNFSFFDCGVNAL -DGDIYLLPTNSIL 364 

Sbjct: 1432 TOVHVaWBQPKIBPia?}^^ 14gx 

Query: 365 MLDKQGQLLDTKl^GILQQAVI^^ 544 

LD +G LD L+G+L A+ D + K +P+GNL+K + S +YMCWPS n r. 
Sbjct: 1492 TUJDRGL&ICHALSGVLSAAIKDCVDINKAIPSGNLIKFDIGSVVV™ 1551 

45 QUery! 545 '^S^SSS**^^ -24 

Sbjct: 1552 NNVORCTRiaj^MCDIVCTIPADYILPLVLSSM * + 



-1TI 1605 



60 Query 



Query: 725 ^^^INVKDVVVESSKSLGKQLGWSDGVDSFEGVLP- -INTOTVLSVAPEVDWVAPY 898 

K+TEDG+NV DVV + KB +Q+GV++D G +P +NT +L+ A Ivnw w 

Sb 3 Ct: 1606 KVTEDGVNVHDVTVTTO^ 1665 

Query: 899 GFPJCAALFASLDVKPyGYPNDPVGGFRVLGTTD^CWVNATCIIt^YLKPTFKSKGL^ 1078 

GP+ A PA ++D + Y + V G RVL T+DNNCWVNA CI LOY KP P es^orT 
Sbjct: 1666 GFKDAVTFATVDHSAFAYESAVVNGXRVLKTSD1TOCWVNAVCIALQYS 1725 

Query: .10.79 WIBffiVTGDVGPFVSEIYFIXMSSKGQKGDAEEALSIQjSEYljlSDSlVTIiEOYSTCDTP ' i»w - 
or,. - WNKPV ODV KG KGDAE+ L+KLS+YL +++ VM S+? C 

Sb 3 ct: 1726 WNKFVLGDVEIFVAFVYWARLMKGDKGDAFJDTLTKLSKYI^FAQVQLEHY^ i 785 



1253 --^^T^^aA^^^ p rf "sstsps^ 118 "? 1426 

Sbjct: 1786 KFKNS VAS IMSAI VCASVKRDGVQVGYCVHG1 KYYSRVRSVRGRA1 X VSVEQLEPCAQgR 1845 

Query: 1427 LLNGIAYTTFSGSFDNGHYVVYDAANNftVYDGARLPASDLSTIAVTAIVVVGGCV 1595 

.. ,„ LL+G+AYT PSG D GHY VYD A ++YDG R DLS L+VT++V+VGG V 
Sbjct: 1846 LLS GVAYTAFSGPVDKGHYTVYDTAKKSM YDGDRFVKHDL SLL S VTSVV^GGYV 1900 



3. Sequence C 

6017 nucleotides; Encodes part of Replicase 
CGAGAACAGCTTGATTCGTTATTTGTTATACTTGTATTTTG 
TTATATTTTGTTGCACAATTTATTAGTACTO 
5 TTTGTGCCGTTTGATGTTTTATGTAATGAGTTTTt^ 
(^TATTATTGTTGGCTGTAATAATGCTGACTGTGT^^ 
CAAACTATTATTAATGGTATGC^TAAATCAT^ 

AACTTCTTTTGTGTTAATTGTGATTCTTTTGGGCCTGGTAATACTTTTATTAATGGTGATATTGCAA 
GGTAATGTTGTTAAAACAGCTGTTCAACC 
1 0 GGATTTTATCGTCTTTATAGTGGTGACACTTTTTGGCGG 
AAAGAGGTTCTGAAGAATTGTAATGOTTTAGA^ 
ATTAAAAATGCTTGTGTTTATTTTTCT 

ACTTTATCAGTTGATTTTAATGGTGTTTTGCATAAGGCATATGTTGATGTTTT 

CTAACTGCTAACATGTCCATGGCTGAATGTAAAGCTAC^ 
1 5 GCTGTTGCCAATGCACATAGGTATGACGTCT^ 

AAACCTGAAGATAAGTTGTCCGTTTATGACATTGCTTGTTGTATGCGTGCCGGTTCTAAGGTTGTTAACCATAAT 

GTTTTAATCAAAGAGTCAATACCTATTGTTTGGGGTGTCAAG 

TACCTTGTTAAAACAACTAAAGCAAAGGGTTTGACTTTTT 

GTTCCTGCTACTAGTATAGTTGG&AAACAGGGTGCTGGT 
20 TTATTTGTTGTTGCATTGTTTATTGGTGTCTCATTT^ 

GATTTTAAGTACATTGAGAATGGTCAGTTGAA^ 

AATTTTAATCAATGGC^TGAGGCTAAGTTTGGTGTTGTTACTACTAATAGT 

GTTTCAGAGCGTATTAATGTTGTTCCTGGTGTTCCAACAAATGTATATTTGGTAGGAAAGACT 

TTAGAGGCTGCTTTTGGAAACACAGGTGTTTC 

2 5 AATTCTGCTTGTACTAGGTTGGAAGGTTTGGGTGGTGAGAATGTTTATTGTTA 

TCTAAACCTTATAGTATTTTAC^GCCCAATGCTTATTATAA 
ATTTTAGCTAGAGGTTTTGGCTTACX3TACTATTAGAACTTTGGCT 
GACTC^CATAAAGGTGTTTGTTTTGGTTTTGATAAATGGTATGTO 
TGTGGTGATGGTCTTATAGACCTTCTTGTTAATGTACTCTCAATCTTTAGTTC^ 

3 0 TCTGGACATATGTTGTTTAATTTTCTTT 

CGTGTTTTTGGTGATCTTTCTTATGGTGTTTTT^ 
GTTACTCAAAATTTATTTTTTATGTTGCTT 

TGGATTTGGCATATTGCATACATTGTTGCATACTTCTTGTTAATACCATGGTGGCTT 
GCTGGATTTTTAGAGCITTTTACCTAATGTTTTTAAGTTAAAAATCT 
35 ATAGGTACTTTTGAGAGTGCTGCTGCAGGTACATTTGTTC 

ATTTCACCTGAGAAACTTAAGAATTATGCTGCAAGTTATAATAAATATAAATATT^ 

GCTGATTATCGTTGTGCTTGTTATGCTCATTTAGCCAAGGCT^ 

TTATATTCTCCACCTACCATTAGCTACAATTCCACCTTACAA 

TGTGTTGAGAGATGTGTGGTTCGCGTCTGTTATGGTAGTACTGTGCTTAATGGAGTTTGGTTAGGTGACACTGTT 

4 0 ACTTGTCCTAGACATGTCATAGCACC!ATCAACCACTGTTCTTAT^ 

TTGCATAATOTTTCAGTGTCTCATAATGGTGTCTT^ 

CGTATTAAGGTTTCACAATCTAATGTACATACA 

AATATTTTAGCATGTTATGAAGGTATTGCATCTGGTGTTTT 

GGTTCTTTTAtAAATGGAGCTTGTGGTOCTCCTGKSTTATAATGTTAGAAATGATO 
45 TTACACCAAATTGAGTTAGGTAGTGGTGCTCATGTTC 

GACGAACCTAGTTTGCAAGTTGAGAGTGCCAACCTTATGCTATCAGATAA 

TTGTTGAATGGTTGTAGGTGGTGGTTGCGTT 

AATGGTTATACAATTGTTTCTAGTGTTGAGTGCTAT 

TTGTTAGCTTCCATTCAACATCTTCATGAAGGT^ 
50 GAGTTCACACTAGCTGAAGTT^^ 

AAAACAATGTTTTTATTTAGCGTTTTCTTCACAATGTTTTGGG^ 

ATAAACCCTGTTATACTTACACCTATATTTTGTTTAC^ 

CATAAGTTTTTGTTTTTGGAAGTATTTTTATT^ 

TATTACATAGTAAAATTTTTGGCT^ 

5 5 GTTA ATGT*U M lGGTCTGTTTATTO 

TTTACATATGTGTGTTCTCTTATAGCAGTTGCTTACACTTATTTTTATAGTGGTC 
ATGTTTTTATGTGCTATATCTAGTGATTGGTACATTGGTGCCAT 
TTTTCACCTGAAAGTGTATTTAGTGTTTTTGGT 
TTAGTTTGTACTTATTGGGGCATT^ 

6 0 AAGGTGAGTGCTGCTGAATTTAAATACATOT^ 

TGGTTATCATTCAAATTACTTGGTATTGGTGGT 
GATTTGAAGTGTACTAATGTTGTGTTATTGG 
GCTTATTGTGTTGATTTACAC^ 
CTCCTTGCGTTCTTTCTAAGTAAAC^TAGT^ 
6 5 AGCACCCTGCAGAGTGTTGCTTG&TCATC 

TATGAGGATGCTATTGCTAATGGATCTTCTTCTCAACTTATTAAACAATTGAAGCGTGCCATGAATATCG 
TCTGAATTTGATGATGAGATATCTGTTCAGAAGAZ^ 

AAAGAAGCACGCTCTGTTAATAGAAAATCTAAAGTTATTAGTGCTATGCACTCTTTACTTTTTGGAATGTTAAGA 



CGTTTGQATATGTCTAGTGTTGAAACT6TTTTGAATTTAGCACGTQATGGTGTT6TGCCATTGTCAGTTa r par , r 
GCA&CTTCAGCTTCGAAACTAACTATTGTTAGTCC^ 

GTTCATTATGCTGGAGTTGTTTGGACACOT^ 

ATTAC^GGGAGAATGTTGAAACTTTGACATG^ 

AATGAAATTATGCCTGGTAAACTTAAGCAAAAACCTAT^^ 

^^^^ AT ^ TACTGAGGGT( 3GTAAAACTTTTATGTATGCTTATATTTCTAATAAAGCTGACCT 
WGGTCCTC^GTGAAGTATTTGTATOT^ 

ATAGGTGCCACAATTCGTCTAGAAGCTGGTAAACAAACTGAATTGGCTGTTAATTCTGGACT 
GCTTTTTCTGTTGATCCAGC^CCACTO 

AAGATGTTATCTAATGGTGCTGGTAATGGTC^^ 
AAGGGTAAATGTGTTCyVGGTTCCTATTGGTTC^ 

Putative QRFa 

>~out: 55 to 5997: Frame 1 1981 aa 

TYLRFGLLYFVAQFISTFGSFLGFHQKQWFLHFVPFDVL 
I^KRVPLQTIINGMHKSFYV^GGTCFCm^ 
DKVDFVNGFYRLYSGDTFVraYDFDITES^ 

VNSELLSTLSVDFNGVLHKAYVDVLCNSFFKELTA^SMAECKATLGLTVSDDDFVSAV 
NFFISYAKPEDKLSVYDIACCmAGSKVViraNVLIKESIPIWGVKDFNTLSQEGKK^ 

DNQAI TQVPATS IVAKQGAGFKRTYNFLWYVCliFWALFI GVS F ID YTTTVTS FHGYDFKYI ENGOLKVFKApS 

lltwfsfaaflellpnvfklkistqlfegdkfigtfesaaagtfvi^^ 

YSGSASEADYRCACYAHIAKAMLDYAKDHNDMLYSPPTISYNSTL^^ 
VT^GDTVTCPRHVIAPSTTVLIDYDHAYSTMRLHNFSVSHNGVFLGVVGVTMHGSVL 

l^gasfnil^ctegiasgvfgvot,rtnftikgsfingacgspgynvrndgtvef™ 



I YTNT IWINPVI LTP IFCIiLLFLSLVLTMFLKHKFIiFLQVFlipTVI 

^VQGLVNVLVC^FWFI^HTWRFSKERFTHWFTYVCSLIAVA 

^™SPESVFSWGDVKLTLVVra^ 

]f ^S?^'^' FKIj ' Ci ^" 1 " GG '- ) ^ < - ; I KI STVQSKLTDLKCTNWIiLGCIj SSMNIAANSSEWAYCVDLHinGSScDDPEB 
AQGMIjLALLAFB^LSKHSDFGIoDGLIDSYFDNSSTLQSVASSFVSMPSYIA 

uc^a^^Sf S^??^? SV ^^ INRM ^QAATQMYKEARS VNRKSKVI SAMHSLLFG^^^^DMSS^^^VLNLARDG^ 

RFMVETPNGPQVKYLYFVKNI^TLRRGAVLGFIGATIRLQAGKQTEL^VNSGr^TACAFS 

>~out: 263 to 511: Frame 2 83 aa 

>~out: 875 to 1054: Frame 2 60 aa 

LFLMMI LFQLLPMHI GMTFCFQI CHLI I FIiFIiMLNLKI S CPFMTIiLVVCVPVLRLIiTIMF 
>~out: 1556 to 1804: Frame 2 83 aa 

*"*?*5!L?..A 8 9.9L. .to. l?66:.Frame 2. -53-aa . 

>~out: 2600 to 2761: Frame 2 54 aa 

ITQKI IMTCYILHLPJ^TIPPYI^^ 

>~out: 2798 to 2980: Frame 2 61 aa 

HHQPLFLLIMIMHIVLCVCIIFQ<^^ 

>~out: 4595 to 4774.: Frame 2 60 aa 

VNIVILVLMALLILILXIVAPCR^ 

>~OUt: 4790 to 4945: Frame 2 52 aa 

ISQSLNLIMRYLFIU^KIjIEWIjNK^ 

>~out: 5048 to 5200: Frame 2 51 aa 

LLLVQILNLILRI,FVM\^ 



>~out: 5753 to 5905: Frame 2 51 aa 

IVn^TPIKILMVERLFVCIVGPTF^^ 



Alignment 

>gi| 12175747|ref|NP 073549.1 1 replicase polyprotein lab [Human coronavirus 229E] 
gi|30179827|8p|Q05002|RlAB CVH22 Replicase polyprotein lab (pplab) (ORFlab polyprotein) [Includes: 

Replicase polyprotein la (ppla) (ORFla)] [Contains: p9; 

p87; pl95 (Papain-like proteinases 1/2) 

CPL1-PRO/PL2-PRO); Peptide HD2; 3C-like proteinase 

(3CL-PRO) (3CLp) (M-PRO) (p34); Unknown protein 1; p5; 

p23; pl2; Growth factor-like peptide (GFL) (pl6); 

RNA-directed RNA polymerase CRdRp) (Pol) (plOO); Helicase 

(Hel) (p66) (p66-HEL); Unknown protein 2; p41; Unknown 

protein 31 

gi 1 12082740 1 gb | AAG48591.1 [ replicase polyprotein lab [Human coronavirus 229E] 
Length = 6758 

Score a 2840 bits (7361), Expect = 0.0 

Identities = 1350/1997 (67%), Positives = 1609/1997 (80%), Gaps = 4/1997 (0%) 
Frame = +1 

Query: 10 LDSLFVI LVFCNFW * T YLRFGLL YF VAQF I S TFGS FLGFHQKQWFIiHF VPFDVL CNEPLA 189 

+ V+++ F YLR LLYFVAQ 1ST G FLG+ + WFLHF+PFDV+C+E L 
Sbjct: 2076 MQP FI VMVLLLI FGDNYIiRCFLLYFVAQMI STVGVFLGYKETNWFIjHF I PFDVI CDELLV 2135 

Query: 190 TFIVCKIvXFVTOIIIVGCN^ 369 

T IV K++ FVRH++ GC N DC+ACSKSARLKR P+ TI+NG+ +SFYVNANGG+ FC 
Sbjct: 2136 TVIvTKVISFTOHvT^FGCENPDCIACS*^^ 2195 

Query: 370 KHNFFC^CDSFGPGOTFINGDIAREI^3NVTOTAVQPTAP 549 

KH FFCV+CDS+G G+TFI +++RELGN+ KT VQPT PAYV+IDKV+F NGFYRLYS 
Sbjct: 2196 KHRFFCVDCDSYGYGSTFITPEVSRELGNITKTNVQPTGPA^ 2255 

Query: 550 DTFWRYDFDITESKYSCKEVLKNCNVLEN^ 729 

+TFWRY+FDITESKYSCKEV KNCNVL++FIV+NN+G+N+TQ+KNA VYFSQLLC PIKL 
Sbjct: 2256 ETFWRYNFDITESKYSCKEVFKNCOT 2315 

Query: 730 VNSELLSTLSVDFNGVLHKAYVDVTjCNSFFI^ 909 

V+SELLSTLSVDFNGvTiHKAY+DVL NSF K+L ANMS+AECK LGL++SD +F SA++ 
Sbjct: 2316 VDSELLSTIiSVDFNGVTiHKAYIDVLRNS S 2375 

Query: 910 NAHRYDVLLSDLSFNNFFI SYAKPEDKLSVYDIACCMRAGSKVVNHNVTjIKESI PIVW 1089 

NAHR DVLLSDLSFNNF SYAKPE+KLS YD+ACCMRAG+KWN NVTi K+ PIVW 
Sbjct: 2376 NAHRCD VTjLSDLSFMNFVSSYAKPEEKLS IVWHA 2435 

Query: 1090 KDFNTLSQEGKKYLVKTTKAKGLTFLLTFNDNQAI TQVPATS I VAKQGAGFK- RTYNFLW 1266 

KDFN+LS 5G+KY+VKT+KAKGLTFLLT N+NQA+TQ+PATSIVAKQGAG + +LW 
Sbjct: 2436 KDFNSLSAEGRKYIVRTSKAKGLTFIiIiTINENQAVTQl PATS I VAKQGAGDAGHSLTWLW 2495 

Query: 1267 YVCLFWAL- FIGVSFIDYTT- - TVTSFHGYDFKYI ENGQLKVFEAPLHCVRNVFDNFNQ 1437 

+C V + F F+ Y V+SF GYDFKYIENGQLK FEAPL CWNVF+NF 

Sbjct: 2496 LLCGIjVCLIQFYLCTFMPYFM^ 2555 

Query: 1438 WHEAKFGWTTOSDKCPIWGVSERINWPGVP^ 1617 

WH AKFG N CPIWGVSE +N V G+P+NVYLVGKTL+FTLQAAFGN GVCYD 
Sbjct: 2556 WHYAKFGFTPLI^SCPIWGVSEIVim^GIPSNv^ 2615 

Query: 1618 DGVTTSDKCI FNSACTRLEGLGGDNVYCYNTDLIEGSIO? YS I LQPNAYYKYDVKNYVRFP 17?7 

GVTT +KCIF SACTRLEGLGG+NVYCYNT L+EGS PYS +Q NAYYKYD N+++ P 
Sbjct: 2616 FG TOTPEKCI FTSACTRLEGLGGNNVYCYNTALMEGSLPYSS IQANAYYKYDNGNFI KLP 2675 

Query: 1798 E I IJ^GFGLRTIRTLATRYCRVGECRDSHKGVCFGFDKWYVTTDGRVDDGYI CGDGLIDXX 1977 

E++A+GFG RT+RT+AT+YCRVGEC +S+ GVCFGFDKW+VNDGRV +GY+CG GL + 
Sbjct: 2676 EVIAQGFGFRTVRTIATKYCRVGECVESNAGVCFGFDKWFVETOGRVANGYVCGTGLWNLV 2735 

Query: 1978 XXXXXXXXXXXXXXAMSGHMLFN^ 2157 

AMSG +L N F F CFLVTKF+R+FGDLS GV TW A L+ 

Sbjct: 2736 FN I LSMFS SSFSVAAMSGQI LLNCALGAFAIFCCFLVTKFRI^F^ 2795 

Query: 2158 NNI SYVVTQNLFFMLLYAILYFVFTRTv^YAW^ 2337* 
NN+SY+VTQNL M+ YAILYF TR++RYAWIW AY++AY PWWI, W+ A 



Sbjct: 2796 NNVSYIVTQNLVTMIAYAILYFFATRSLRYAWIWC^YLIAYISFAPWWL^WYFIAMLT 2855 

Query: 2338 ^^^^^^STQLFBGDKFIGTFESAAAGTFVLDMI^YERIiINTISPEKIiXXXXXXXX 2517 

. , LLP++ KLK+ST LFEGDKF+GTFESAAAGTFV+DMRSYE+L N+ISPEKL^^^ 

Sbjct: 2856 GLLPSIiKLKVSTNLFEGDKFVGTFESAAAGTFVIDMRSYEKLANSISPEKLKSYAftSYN 2915 

Query: 2518 XXXXXXXXXXFJffiYRCACYAHL^^ 2697 

«avlnh dot* Dvvwo ^ TO »^ TO ^ CTA+ ^ Kft ^ +++DHND+I 'Y+PPT+SY STLQ+GL+KMAQPS 

SbDCt: 2916 RYKYYSGNAlffiADYRC^CYAYLAKAMLDFSRDHiroiLYTPPTVSYGSTIjQAGLM 2975 

Query: 2698 GCVERCVV^Y^ 2877 
_ . , G VE+CWRVCYG+TVLNG+WLGD V CPRHV1A +TT 1DYDH YS MRLHNtoI 

Sbjct: 2976 WVfflKWVHVCTO^^ 3035 

Query: 2878 G ^^ G ^ G ^5v^ GSVIiR * KVSQSNVHTPKHVFKTLKPGASFNI LACYEGI ASGVFGVNLR 3057 

FLGWG TMHG L+IKVSQ+N+HTP+H F+TLK G FNILACY+G irowwSli 
Sb 3 Ct: 3036 TAFLGWGATMHGVTLKIKVSQTNMHTPRHSFRTL^G^^ 3095 

Query: 3058 ^^^^^J^®ACGSPGYNVRKODGTVEFCnn^QIEl.GSGAHVGSDFTGSVYGKFDDQPS 3237 

TN+ TI+GSFINGACGSPGYN++N G VEF Y+HQIELGSG+HVGS J? r- Iv£ i?!Znr£? 
Sbjct: 3096 ^IRGSFINGACGSPGYI^KN-GEVEFVY^ 3154 

Query: 3238 JflTOSMIIM^^ 3417 

. „,„ LQVESAN ML+ NWAFLYAA+LNGC WWL+ ++ V+ +NEWA ANG+T ++ + +SI 
Sbjct: 3155 LQVESANQMLTVNWAFLYAAILNGCTWWLKGE^ 3214 

Query: 3418 ^^^^^^ST^QLLAS IQHIiHEGFGGKNI LGYS S LCDE FTIiAEWKQMYGVlSILQSGKVl FG 3597 

_.. , IAAKTGV VE+LL +IQ L+ GFGGK ILGYSSL DEF++ EWKOM+GOTTLOSr^ 

Sbjct: 3215 LAAKTGVCVERLLHAIQVIJTOGFGGKQILGYSSLllDEFSnj^^ 3274 

Query: 3598 L ™-f™^^ 377? 
SbDCt: 3275 FKSlSLFAGFFVMFWAELFVYTTTIVTVlTPGFIjTPFMIIiLVALSLCLTFV^ 3334 

stLT L'L 8 ^ M K a S W 5^^ 3957 
Sb 3 Ct: 3335 LLPSXXVAAXQNCAWDYHVTKVI^ 3394 

Query: 3958 ^^F^CSLIA^YTYFYSGDFI.SLLWLCAISSDWYIGAIVFRLSRLIIFFSPE 4137 
Sbjct, 339= Bi« OT »CT Y LF SI ,IAVI.y ral ,V Srol V S ^^^i^^Igj c 5 F< ^| v 345, 
QU ° ry! 4138 ^'^^'^^j^^^P'^IOSraVCTYtraiLYHPBEPFKCTOQVXOTKreiUUWKXMWlHQ 4317 

Ouory: 4=78 J^f™a»MOMnpi^ 4 SS7 
3=3= ^1^^8SSi«J^^J S lXSSK Z 

sbjct, 3=95 8»awmi3MvmuKiOTVBSMHSLi.PGmjMa^ 1 is 3754 



Query: 5578 G ^TEIAVNSGLLTACAFSVDPATTYI^^ 5757 
GKQTE NS LLT C+F+VDPA YL+AVK GAKPV NC+K>IL+NG+G+GQAIT ++D+^ 



Sbjct: 3935 GKQTEFVSNSHLLTHCSFATO^ 39^4 

Query: 5758 NTNQDS YGGAS I CLYCRAHVPHP SMDGYCKFKGKCVQVP I G CLD P IRFCL^PTVCNVCGC 5937 

NT QD+YGGAS+C+YCRAHV HP+MDG+C++KGK VQVPIG DPIRFCLEN VC VCGC 
Sbjct: 3995 NTTQDTYGGASVCIYCRAHVAHPTMDGFCQYKGKWVQVP 4054 

Query: 5938 WLGHGCACDRTTIQSVD 5988 

WI* HGC CDRT IQS D 
Sbjct: 4055 WLNHGCTCDRTAIQSFD 4071 

4. Sequence D 

TGTTCGTGCTTTTGACATC^ 

TAAAAATGCTGATCTTAAGGATGGTTATTTT^^ 
CATGTATAACCTACTTAACTTTTCTGGTGCTTTGGCTGAGCATGAT^ 

TTATGGTAATGTTAGTAGAC^TAATCT^^ 

TGAACAAAATTGTGATGTTCTAAAAGAAGTATTAGTTTTAACTGGTTGTTGTGACAATTCTTATTT 

GGGTTGGTATGACCCAGTTGAAAATGAAGATATACAT^ 

TATGCTTAAATGCGTTGCTCTAT^^ 

AGATCTTAATGGTAACrTTTATGAT^ 

ATCATATTATTCTTATATGATGCCTATTATGGGTTTAACTAATTGTTTAGCTAGTGAGTGTTTTGTCAAGAGTGA 

TATTTTTGGTAGTGATTTTAAAACTT^ 

TAAGTACTTTAAGCATTGGAGTTTTGATT^^ 

TTGTGCTAATTTTAATACACTATTTGCCACAAC 

TATAGATGGTGTTCCACTTGTTACAACTGCT^ 

XAACACACACTCAGTTAGGTTGACAATCACTOAACT^ 

TTCTC<^GCACTCGTTGATGAACGC^CTATTTGTTTTTCTO 

TGTTAAGCCAGGTCATTTTAATGAAGAGTTTO 

ACTTACATTAAAACATTTCTTCOT 

TAAGCCTACC^TTTTAGATATTTGTCAAGC^^ 

AGGTGGCTGTATTAAGGCATGTGAAGTTGTTGTAACAAATC 

TGGTAAAGCTAGTTTGTATTACGAATCTAT^^ 

TGTCCTCCCTACTATGAC^CAGCTGAATCTTAAGTATGCTATTAGTGGTAAAGAACGTGCTAGAACTGTTGGTGG 
TGTTTCTCTGTTGTCCACAATGACCACAAGACAATACCAT^^ 

TGCCACTGTTGTTATTGGTACTACCAAATTTTATGGTGGTTGGAATAATATGTTGCGTACTTTAATTGATGGTGT 

TGAAAACCCmTGCTCATGGGTTGGGATTATCCCAAATGTGATAGAGCTTTGCCTAA 

AGCCATGGTGTTGGGTTCTAAGCATGTTAATTGTTC 

GGC^GAAGTTTTAACAGAAGTTGTTTATTCTAATGGTGGTTTTTATTTTAA 
CGCTAGTACAGCTTATGCTAATTCTATTTTTAAGATTTTTCAAGCCGTGAGTTCTAA 
TGTCCCATCAGATTCATGTAATAATGTTAATGTTAGGGATCTACAACGACGTC 
AACTAGTGTTGAAGAGTGATTCATTGAT^ 

TGATGACGGTGTTGTCTGTTATAACAAGGATTATGCTGAGTTAGGTTATAT^ 

CACTTTGTATTACCAGAATAATGTCTTTATGAGTACTTCT 

ACATGAGTTTTGTTCCCAGCATACTATGCAAATAGTTGATAT^GATGGTAC 

TAGTAGGATCTTGTCAGCTGGTGTTTTTGTTGATGATGTTGTTAAGA 

TGTGTCTTTAGCTATTGATGCATACCCTCTOT 

ACTTGATTGGGTTAAGCATCTTAACAAAAATTTGAATGAGGGTGTTCTTGAATCTTTTTCTGTTACACTTCTTGA 

TAATCAAGAAGATAAGTTTTGGTGTGAAGATTTCT 

TOQCTTATGTGTTGTTTC^ 

TAAATGTGCaTATGATCATGTATTTGGTACCGLACCACAAG 

ATC^GGTTGTGGTGTTAGTGATGTTAAAAAA 

AC^GTTGTCTTTTcCATTATGTTCTGCTGGTAATATATTTGG 

TGOTGAAGTTTTTAATAGGCTTGCAACGTCTGATTGGACTGATGTTAGGGACTATAAACTTG 

AGATAGACTTAGACTCTTTGCGGCTGAAACTATTAAAGCTAAAGAAGAGAGTGTTAAGTCTTCTTATGCTTTTGC 

AACTCTTAAAGAGGTTGTTGGACCTAAAGAATTGCTTCOT 

TCGTAATTCTGTTTTCACCTGTTTT 

GGTTGAATATGGTTCTGATACTGTTACGTATAAGTCTACTGTAACC^CTAAGTTAGTTCCTGGTATGATTTTTGT 

CTTAACATCTCACAATGTTCAACCTCT 

ATTGCACCCTGCTTTTAATGTCAGTGATGC^ 

GATAACTACAATACAGGGTCCTCCTGGTAGTGGTAAGTCAC^ 

TGCGCGTATTGTTTTTGTTGCTTGTGCCCATGCTGCTGTTGATTCCTTATGTGCAAAAGCTATG 
GATTGATAAGTGTACTAGGATTATACCTGCAAGAGCTCGGGTTGAGTGTTATAGTGGCTTTAl^ 
TAGTGOACAATACATATTTAGC^CTGTTAACGCATTACCTGAGTGTAATGCT 
TTCAATGTGTACAAATTATGACCTTTCTG 

TCCACAACAACTTCCTGCACCTAGAGTAATGATTACTAAAGGTGTTATGGAGCCTGTTGATTATAACGTTGTTAC 
TCAACGTATGTGTGCTATAGGCCCTGATGTTTTTCTTCATAAATGTTATAGATGTCCTGCTGAAATAGTTAATAC 



AGTTTCTGAACTTGTTTATGAGAACAAGTTTGTCCCTC^ 

TAAGGGTAATGTACAGGTTGACAATGGCTCTAGTAOT 

TAAAAATCCAAGTTGGAGTAAGGCTGTGTTTATTTCTC 

AGGACTTCAAATTCAAACTGTTGATTCTTCTCAAGGTAGTGAGTATGATO 

C^CTGCAC^TGCTTGCAATGTAAACCGTTTTAATGCT 

GTGTGATAAAACTTTGTTTGATTC^CTTi^^ 

TGGCTTGTTTAAAAATTGTACACGCACTCCT 

AGATCAGTTTAAGACTACAGGTGATTTAGCTGTTCAAATAGGT 

ATGATTTATGGGTTTTAGGTTTGATATTAGTATTCCTGGTAGTGATAGTTTG 

TCGTAATGTGCGTGGTTGGTTGGGTATGGATGTT^ 

TCCTTTACAGGTTGGTTTTTCAAATGGTGTTAATTTTO 

TGATGTTATTAAACCTGTTTGTGCAAAATCTCCACCAGGTG^ 

AGGACAACCTTGGTTAATTGTTCGIIAGACGCAT^ 

TCTTGTCTTTGTTTTGTGGGCAGGTAGTTTGGAATTAACTACAATGCGTTAC 

ATATTGTTATTGTGGTAATTCTGCC^CT^^ 

GGGTTGTGATTATGTTTACAATCCGTATGCTTT^ 

Hypothesized QRFs 

>~out: -1 to 5320: Frame 2 1774 aa 

SL IRRARGS SAARX.E PCNGTD IDKCVRAFDI YNKNVS FLGKCLKMNCVRFKNADLKDGYFVIKRCTKSVMEHEQ c 

MYl^LNFSGAIiAEHDFFTWK^ 

GWYDPVENEDIHRVYASLGKIVARAMLKCV^ 

SYYSYMMPIMGLTNC^SECFVKSDIFG 

CANFNTLFATTI PGTAFGPLCRKVFIDGVPLVTTAGYHFKQLGIiVWNKDVNTHSVRLTI TELLQFVTDPSLI IA C 

SPALVDQRTI CFSVAALSTGLTNQWKPGHFNEE FYNFIiRIiRGFFDEGSEIiTLKHFFFAQNGDAAVKDFDFYRYP 

KPT IIiD I CQARVT YKI VSRYFD I YEGGCI KACE VVVTNLWKSAGWPIiNKFGKASLYYE S I S YEEQDALFALTKRE 

VLPTMTQLNIiKYAI SGKERARTVGGVSLIiSTMTTRQYHQKHIiKS I VWTRNATVVIGTTKPYGGWNNMIjRTIi idg\ 

ENPMLMGWDYPKCDRALPNMIRMISAiyr^ 

ASTAYANSIFNIFQAVSSNIl^IiSVPSDSCN^^ 

DDGVVCYNKDYAELGYIADISAFKATLYYQNNVFMSTO 

SRII1SAGVFVDDVVKTDAVVLI.ERYVSI1AIDAYPLSKHPNSEYRKVFY 

NQEDKFWCEDFYASMYENSTILQAAGLCWCGSQ 

SGCGVSDVKXLYLGGLBHre^ 

DTIiRLFAAETIKAKEESVKSSYAFATLKEVVGPKE^ SKDSKFGI GEFIFEB 

^^^y^« < q TXtrrt urn % rr.r ft msT Tmrnirr t rr>/^'» m ~n i— it t-i- m*-ii-T-B.i-r -rj~* „» — . , "** ** ** ' * 



SAQYI FSTVNALPE CHAD I VVVDEVSMCTNYDLSVINQRLS YKHI VYVGDPQQL PAPRVMI TKGVMEPVDYNVV1 
QRMCAIGPDWIaHKCYRCPAEIVNTVSELVYENKFVPVKPASKQCFKI FFKGNVQVDNGS S INRKQIiEI VKLFJjV 
KNPSWSKAVFISPYNSQNYVASRFLGIjQIQTVDSSQGSEYDYVI YAQTSDTAHACNVNRFNVAITRAKKGI FCW 
CDKTLFDSLKFFEIKHADLHSSQVCGLFKNC 

SFMGFRFD I S I PGSHSLFCTRDFAIRNVRGWLGMDVESAHVCGDNI GTNVPLQVGFSNGVNFVVQTEGCVSTNFG 
DVI KPVGAKS PPGEQFRHZATPFLRKGQPWLI VRRRI VQMI SDYLSNLSD I LVFVLWAGSLELTTMRYFVKIGP TK 
YCYCGNSATCYNSVSl^YCCFKHAIiGCDYVYNPYAFDIQQWGYVGSLSQ 
>~out: 189 to 341: Frame 3 51 aa 

>~out: 726 to 977: Frame 3 84 aa 

LVSVIJSRVIFLVVIL^ 

HYWKFL 

>~oufc 2661 to 2903: Frame 3 81 aa 

MRWLNIJFIJLHFLIIKKISFGViaFMLVC 

VPTTSLFWL 

>~out: 3075 to 3296: Frame 3 74 aa 

MLKFLIGLQRLIGLM^ 

LNHL 

>~out: 3741 to 3890: Frame 3 50 aa 

LFIALISVLGLYL^^ 

>~oufc 4500 to 4676: Frame 3 59 aa 

CVIKLCIJHI^FUtL^ 

>~out: 4692 to 4862; Frame 3 57 aa 

VQIMFVLMNML^ 

>~out: 4866 to 5039: Frame 3 58 aa 
VIMFLYKLWQMVLIIXCK^ 
>~out: 5166 to 5315: Frame 3 50 aa 
GQLNIVIVVII^LVnQLVMNIV^ 



Ali gnment nnm . 

1 12175747 l rrf| NP 073B49.ll replicase polyprotein lab [Hum^ coronavirus 229E] 
^ ^^m^ ^SO^ ^^- Qgmg Replicase polyprotein lab (pplab) (ORFlab polyprotein) ancludes: 
Replicase polyprotein la (ppla) (ORFla)] [Contains: p9; 
5 p87; pl95 (Papain-like proteinases 1/2) 

(PL1-PR0/PL2-PR0); Peptide HD2; 3C-like proteinase 
(3CL-PRO) (3CLp) (M-PRO) (p34); Unknown protein 1; p5; 
p23; pl2; Growth fector-like peptide (GFL) (pl6); 
RNA-directed RNA polymerase (RdRp) (PoD (plOO); Helicase 
1 0 CHel) (p66) <p66-HEL); Unknown protein 2; p41; Unknown 

protein 31 

1 12082740 1 gb | AAG48591.1 1 replicase polyprotein lab [Human coronavirus 229EJ 
Length =6758 

1 5 Score = 3137 bits (8134), Expect = 0.0 

Identities » 1465Y1773 (82%), Positives = 1633/1773 (92%) 
Frame = +2 

Ouerv 2 SLIRRARGSSAARLEPCNGTDIDKCVT^ 181 
Query, z s + RRGSSAARIJSPCNGTD ID CVRAFD+YNK+ SF+GK LK NCVRFKN D D ++ 
Sbjct: 4073 SYI^WGSSAARLEPOTGTDIDYC^^ 4132 

Ouerv 182 VIKRCTKSVMEHEQSMYNLI^ 361 
Query. 1B2 ^^^^^SMYNLL A+A+HDFFTW +GR IYGNVSR +LTKYTMMDL +A 
25 Sbjct: 4133 IVTCRCIKSVMDHEQSMYITLLK^ 4192 

Ouerv 362 MRNFDEQNCDVLKEVLVLTGCCDNSYFDSKGWYDPVENEDIH^ 541 
Query. 362 «™£~ v ke+lvLTGCC YF+ K W+DP+ENEDIHRVYA+LGK+VA AMLKCV 
Sbjct: 4193 I^mFDEKDCEVFKEILV^TGCCSTDYFBMKNWFDPIENED 4252 

30 ouerv 542 ALCDAMVAKGWGVLTLDNQDLNGNFYX)FGDFWSLPNMGVPCCT 721 
Query. 542 ^^DIW KGy^ P MG+P CTSYYSYMMP+MG+TNC 

Sbjct: 4253 AFCDEMVLKGWGVLTLDNQDLNGNFTO^ 4312 

3 5 Ouerv- 722 LASECFVKSDIFGSDFKTFDLIjKYDFTEHKENIiFNKYFK^^ 901 
" WUery. lasecp+KSDIFG DFKTFDLLKYDFTEHKE LFNKYFK+W DYHP+C DC+D+MC+ + H 

Sbjct: 4313 IiASECFMKSDIFGQDFKTFDLLKTXX>FTEHKE 4372 

Ouerv 902 C/ySTFNTLFATTIPGTAFGPLCRKVM 1081 
wuc y . c+NFNTLFATTIP TAFGPLCRKVF IDGVP + V TAGYHFKQLGLVWNKDVNTHS RLTIT 

Sbjct: 4373 CSNFNTLFArTIPOTAFGPLCRKvFIDGV^ 4432 

Ouerv 1082 ELLQFVTDPSLI IASS PALVDQRTI CFSVAALSTGLTNQWKPGHFNEEFYNFLRIjRGFF 12 61 
EIiLQFVTDP+LI+2^SSPALVD+RT+CFSVAALSTGLT+Q VKPGHFN+EFY+FLR +GFF 
45 Sbjct: 4433 ELLQFVTDPTLIVASSPALVDKRTVCFSVAALSTGLTSQTVKPGHFNKEFYDFLRSQGFF 4492 

Ouerv 1262 DEGSELTLKHFFFAQNGDAAVKDFDFYRYNKPTILDI 1441 

DEGSELTIiKHFFF Q GDAA+KDFD+YRYN+PT+LDI QARV Y++ +RYFD YEGGCI 

Sbjct: 4493 DEGSELTIiKHFFFTQKGDAAI KDFDYYRYNRPTMIiDI GQARVAYQVAARYFDCYEGGCIT 4552 

50 

Ouerv 1442 ACEVVVTNLNKSAGWPLNKFGKASL YYES I S YEEQDALFALTKRNVLPTMTQIjNLKYAI S 1621 

J ' + EVVVTNIjNKSAGWPLNKFGKA LYYESISYEEQDA+F+LTKRN+LPTMTQLNLKYAIS 

Sbjct: 4553 SREVVVTJOLNKSAGWPLNKFGKAGLYYES I S YEEQDAI FSLTKPNI LPTMTQLNIiKYAI S 4612 

55 Ouerv 1622 GKERARTVGGVSLLSTMTTRQYHQKHLKS IVNTRNATWI GTTKFYGGWNNMLRTIiIDGV 1801 
GKERARTVGGVSLL+™TTRQ+HQK LKSIV TRNATWIGTTKFYGGW+-NMIJ+ L+ V 
Sbjct: 4613 GKERARTVGGVSIjIjATMTTRQFHQKCIiKS I VATRNAT WT GTTKFYGGWDNMLKNIjMADV 4672 

Ouerv 1802 EOTMIiMGWDYPKCDRALPNMIRMI SAMVLGS 1981 
60 • * ' ++ p LMGWDYPKCDRA+P+MIRM+SAM+LGSKHV CCT +D+FYRL NELAQVLTEWYS 

Sbjct: 4673 DDPKLMGVTOYPKCDRAI^ 4732 

Ouerv 1982 NGGFYFKPGGTTSGDASTAYANS I FNIFQAVSSNIlJRIiI*SVPSDSCNITVNVRDLQRRLYD 2161 
* NGGFYFKPGGTTSGDA+TAYANS+FNIFQAVSSNIN +LSV S +CNN NV+ LQR+LYD 

65 Sbjct: 4733 NGGFYFKPGGTTSGDATTAYANSVFNIFQAVSSNINCV^ 4792 

Ouerv 2162 NCYRLTSVEESFIDDYYGYLRKHFSI^ILSDDGVVCYNKDYAELGyiADISAFKATLYYQ 2341 

NCYR ++V+ESF+DD+YGYL+KHFSMMILSDD WCYNK YA LGYIADISAFKATLYYQ 
Sbjct: 4793 NCYRNSNVDESFVDDFYGYLQKHFSMMILSDDSVVCYNKTYAGLGYIADISAFKATLYY 4852 

Query: 2342 NNVFMSTSKCWVEEDLTKGPHEFCSQHTMQIVDKDGTYYLPYPDPSRILS^ 2521 

N VFMST+KCW EEDL+ GPHEFCSQHTMQIVD++G YYLPYPDPSRI +SAGVFVDD+ K 
Sbjct- 4853 NGVFMSTAKGWTEEDLS IGPHEFCSQHTMQI VDENGKYYLPYPDPSRI I SAGVFVDD1 TK 4912 
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Query: 2522 TDAVVLLERYVSLAIDAYPLSKHPNS E YRKVF YVLLDWVKHIiNKNLNEGVIjE S FS VTLLD 2701 

TDAV+LLERYVS LAIDAYPLSKHP EYRKVFY liLDWVKHLNK LNEGVIiESFSVTIiIjD 
Sbjct: 4913 TDAVILLERYVSIiAIDAYPLSKHPKPEYRKVFYAL^ 4972 

Query: 2702 NQEDKFWCEDFYASMYENSTILQAAGLCWCGSQT^ 2881 

E KFW E FYASMYE ST+LQAAGLCWCGSQTVLRCGDCLR+PMLCTKCAYDHVFGT 
Sbjct: 4973 EHESKFWDESFYASMYEKSTVLQAAGLCWCGSQTVLRCGDCLRRPMLCTKCAYDHVFGT 5032 

Query: 2882 DHKFILAITPYVCN&SGCGVSDVKKL^ 306l 

DHKFILAITPYVCN SGC V+DV KLYLGGLNYYC +HKP LSFPLCSAGN+FGLYK+SA 
Sbjct: 5033 DHKFIIiAITPYVCNTSGCNVNDVTKLYLGGLNYYCVDHKPHL 50 92 

Query: 3062 TGSLDVEVFNRLATSDWTDVRDYKLAND^ 3241 

GS+D++VFN+L+TSDW+D+RDYKIAND K++LRLFAAET+KAKBESVKSSYA+ATLKE+ 
Sbjct: 5093 LGSMDIDVFNKLSTSDWSDIRDYKItMSD^^ 5152 

Query: 3242 VGPKEI1LI1SWESGKVKPPLNRNSVFTCFQI SKDS KFQIGEFI FEKVEYGSDTVTYKSTVT 3421 

VGPKELLL WESGK KPPLNRNSVFTCFQI +KDSKFQ+GEF+FEKV+YGSDTVTYKST T 
Sbjct: 5153 VGPIOSIiLIiLWESGKAKPPIiNRNSVFTCFQITKDSKFQ 5212 

Query 

Sbjct 

Query 

Sbjct 

Query 

Sbjct 

Query 

Sbjct 

Query 

Sbjct 

Query 

Sbjct 

Query 

Sbj ct 

Query 

Sbjct 

Query 

Sbjct 

Query 

Sbjct 

Query 

Sbjct 



3422 TKLVPGMIFVLTSHWQPLRAPTIANQEKYSSIYKLHPAFNVSDAYANIiVPYYQLlGKQK 3601 

TKLVPGM+F+LTSHNV PIiRAPT+ANQEKYS+IYKIiHP-f FNVSDAYANLVPYYQLIGKQ+ 
5213 TKLVPGMLFIfcTSHNVAPIi^ 5272 

3602 I TTIQGPPGSGKSHCS IGLGLYYPGARI VFVACAHAAVDSLGAKAMTVYS IDKCTRI I PA 3781 

ITTIQGPPGSGKSHCS IG+G+YYPGARIVF AC+HAAVDSLCAKA+T YS+DKCTRIIPA 
5273 ITTIQGPPGSGKSHCS IGIGVYYPGARI VFTACSHAAVDSLCAKAVTAYSVDKCTRI I PA 5332 

3782 RARVECYSGFKPl^SAQYIFSTVNALPECNADIVVVDEVS 3951 

RARVECYSGFKPNN SAQY+FSTVNAtiPE NADIVWDEVSMCTNYDLSVINQR+SYKHI 
5333 RARVECYSGFKPN^SAQYVFSTVNALPEWAD 5392 

3962 VYVGDPQQLPAPRVMITKGVMEPVDYNV^ 4141 

VYVGDPQQLPAPRV+I +KGVMEP +DYNWTQRMCAI GPDVFIiHKCYRCPAE I VNTVSELV 
5393 VYVGDPQ QLPAPRVL I S KGVMEP IDYWVVTQX^MCAI GPDVFIiHKCYRCPAE I VNTVS ELiV 5452 

4142 YENKFVPVKPASKQCFKIFFKGNVQVDNGSS INRKQLEI VKLFLVKNPSWSKAVF ISPYN 4321 

YENKFVPVK ASKQCFKIF +G+VQVDNGSS INR+QL+ + VK F+ KN +WSKAVFISPYN 
5453 YENKFVPVKEAS KQCFKI FERGS VQVDNGSS INRRQLDWKRFI HKNSTWSKAVF I SPYN 5512 

4322 SQNYY^RFLGLQIQTVDSSQGSEYDYVIYAQTSDTAHACNVNRF isoi 

SQNYVA+R I1GI1Q QTVDS+QGSE YDYVI +AQTSDTAHACN NRFNVAITRAKKGIFC+M 
5513 SQNYVAARLLGLQTQTVDSAQGSEYDYVIFAQTSDTAHACN 5572 

4502 CDKTLFDSLKFFEIKHADLHSSQVCGIiFKNCTRTPLNIiPPTHAHTFLSLSDQFKTTGDL^ 4681 

D+TLFD+LKFFEI DL S CGLFK+C R P++LPP+HA T+IiSLSD+FKT+GDIiA 
5573 SDRTLFDALKFFEITMTDLQSESSCGIiFKDCtf^^ 5632 

4682 VQIGSNNVCTYEHVI SFMGFRFDI S I PGSHSLFCTRDFAIRNVRGWLGMDVESAHVCGDN 4861 

VQIG+NNVCTYEHVTS +MGFRFD+S+ PGSHSLFCTRDFA+R+VRGWIjGMDVE AHV GDN 
5633 VQ IGNNNVCTYEHVI S YMGFRFDVSMPGSHS LFCTRDFAMRHVRGWLGMDVEGAHVTGDN 5692 

4862 IGTNVPLQVGFSNGVNF WQTEGCVSTNFGDVI KPVCAKS PPGEQFRHLVPFLRKGQPWIi 5041 

+GTNVPLQVGFSNGV+FV Q EGCV TN G V+KPV A++PPGEQF H+VP LRKGQPW 
5693 VGTNVPIiQVGFSNGVDFVAQPEGC^TmX3SWKPVRARAPPGEQF^ 5752 

5042 IVRRRI VQMI SDYLSNLSDILVFVLWAGSIjELTTMRYFVKI GPIKYCYCGNSATC YNSVS 5221 

++R+RIVQMI+D+L+ SD+LVFVLWAG LELTTMRYFVKIG +K+C CG ATCYNSVS 
5753 VLRKRIVQMIADFIAGSSDVIiVFVLWAGGIjELTTMRYFVKXGAVKHCQCGTVATCYNSVS 5812 

5222 NEYCCFKHALGCDYVYNPYAFDIQQWGYVGSIiS 5320 

N+ YCCFKHALG CDYVYNP Y DIQQWGYVGSLS 
5813 NDYCCFKHALGCDYVYNPYVIDIQQWaYVGSIiS 5845 



5. Sequence E 

6143 nucleotides; 3* end of Replicase and 5* end of Spike 
TCTGGtfVATTGTAATGTTgATM^ 
TTAATTTAGAAGGTGTTAATGGTGGTTCTCT^ 
GTGCTTTTGTTAAATTAAAACCTATGCOT 

TTAATTATGTACCCCTTCGCGCTAGTAGTTGTGTTACCCGTTGTAATATAGGTGGTGCTGTTTGTTCAAAACATG 

CAAATTTGTATCAAAAATATGTTGAGGGATATAATACATTTACACAGGCTGGTT 

GTTTTGATGTTTATAATTTGTGGCAAATTTTTATTGAAACT^ 

TTGTAAAAAAaGGGTGTTTTACTGGTGTTGATGGTGAGT^^ 

GCTATGGCGATGTTGACAACTTGGTTTTTACAAATAAAACAAC^^ 



CAAAACGAAAiUVLTGGGTTTAACACCACCATTGTCTATTCTCAAAA? 

TTTTATGGGATTATGAAGCTGAAAGACCTTTTACCTCATATACTAAv^^ ^i^xv, .. x^^g^ y^.. - 
AGGATGTTTGTGTTTGTTTTGACAATAGTATTGA 



AGGATGTTTGTGTTTGTTTTGACAATAGTATTCAGGGTTCGTATGAGCGTTTTACGCTTACTAC 1 
TATOTTGTAOTGTTGTCATrAAAAA^^ 
CTTCTATTAAGJ^ 

AAGATCATTATGATGGTTTTTACACTCAAGGTAGGAATTTATCAGACTTTACA 

ATT^CTTAAGATGX^TATC 

ATGGTCATGTTTCAAAAACTACATTA^ 

TTTOAAAGCTOATGAOTTTGTCACTGCTTCTGAC^ 

SraTAAraTCTAA^ 
acSStctc^c™ 

! n^SSSn^m^a n a nar^paf^TTT^TPAATACCTAAATAGCACTACy^TGTGCGTACCTCCATAATATGC 




a^ctgatg^aaatgatatc 
gt^tottag^tcatttagtctcaactaaatga^ 



tctgactaataaaataciattt^ 

ttoaactaa^totttc^actoctcatgattgtatagttaatttgtc^t 
cataactatatcgggtgaaactgtac^ 

TAAACOTACTAAACTTAGTGT^^ 

TCTTAATGTCACCACACTTAA.T 

CATA^TCTCTT^CAGGAT^ 

TCC^2AGJ^TAGTGGACG 

TAAATCTTCAACTGGTTTTGTTTATT^ 

TGTTGCTGATGTTATGCGTTACAATCTTAACCTC^GTGCTA 

T^AAAACTTTACAGTACGATGTTTTGTOTTATTGTAGTAATTCT^ 

TCTOGGCCCTTCCTCTCAACCTTATTACTGTTTTATA^ 

GGGTATTTTACCACCCACTGTGCGTGAAATTGTTGTTGCTAGAACTGGT^ 

S?c5™Sggtt5catagaagctgtc^ 

ATTTGCTACTTTTGTTGATGTTTTGGTTAATGTTAGTGCaACTAACATTCAAAACTTACTTTATTGCGA^TCTCC 

ATTTGAAAAGTTGCAGTGTGAGCACTTGCAGTTTGGATTGCAAGATGGTTTTTATTCTGCAAATTTTCTT^ 

TAATGTTTTGCCTGAGACTTATGTTGCACTCCCCATTTATTATCAACATACGGACATAAATTTTACT 

ATCTTTTGGTGGTTCTTGTTATGTTTGTAAR.CCACGCCAGGTTAATATATCTCTTAATGGTAACACTT^GTGTG 

TGTTAGAACATCTCATTTTTCAATTAGGTATATTTATAACCGCGTTAAGAGTGGTTCACCAGGTGACTC 

GCATATTTATTTAAAGAGTGGCACTTGTCCATTTT 
. TTGTTTCTCAACCGTCGAAGTGCCTGGTAGTTGTAATTTTC 
TATTGTTGGTGCTTTGTATGTTACTTGGTCTGAAGGTAATTCC^ 
TCGTGAGTTTAGTAATTTAGTTTTAAATAATTG^ 

•CACTGGTAACATTTTTATTGTGAC^CCATGTAACCAACCAGATCAAGTAGCTGTTTATCAACAAAGCAT 
!TGGCATGACCGCTGTTAATGAGTCTAGATATGGCTTGCAAAACTTACTACAGTTACCTAACTTTTATTA 
iTAATGGTGGTAACAATTG(^CTACGGCTGTTATGATTTATTCTAATTTTGGTATTTGTGCTGATGGTTC 
'TCCTGTtCGTCCGCGTAaTTCTAGTGATAATGGTATTTCAGCCATAATCACTGCTAATTTATCCATTCC 
tCTGGACTACTTCAGTTGRAGTTGAGTACCTCCAAATTACTAGTACTCCAATAGTTGTTGATTGTGCTAC 
TTATGTGTGTAATGGTAACCCTCGTTGTAAGAATCTACTTAAGCAGTATACTTCTGCTTGTAAAACTATTGAAGA 
TGCCTTACGACTTAGTGCTCATTTGGAAACTA&TGATGTTAGTAGTATG^^ 

TTTGGCTAATGTTACTAGTTTTGGAGATTATAACCTTTCTAGTGTTTTACCTCAGAGAAACATTCATTCAAGCG 
™ ™ ~™ n/ia n^anTV^r-rpTTr'flAaaaTTTriTTnTTTAGCAAAGTTGTTACATCTGGTTTGGGTACTGTTGATGT 



ACGTTC 
TGTTT 



CTTC3^CCAGTCACTTGCrGGTGOTATTA 
TTCCACTGGTAACATTTTTATTGTGACACCATGTAACCAACCAGATCAAGTAGCTGTTTATCAACAZUVGCAT 

'TGGTGCCATGACCGCTGTTAATGAGTCTAGATATGGCTTGCAAAACTTACTACAGTTACCTAACTTTTATTA 
TGTTAGTAATGGTGGTAAGAATTGCACTACGGCTGTTATGATTTATTCTAAT^ 

TTTAATTCCTGTtCGTCCGCGTAATTCTAGTGATAATGGTATTTCAGCCATAATCACTGCTAATTTATCCATTCC 
GTCTAACTGGACTACTTCAGTTGAAGTTGAGTACCTCCAAATTACTAGTACTCCAATAGTTGTTGATTGTGCTAC 
TTATGTGTGTAATGGTAACCCTCGTTGTAAGAATCTACTTAAGCAGTATACTTCTGCTTGTAAAACTATTCAAGA 
TGCCTTACGACTTAGTGCTCATITGGAAACTAATGATGTTAGTAGTATGCTAACTTTCGATAGCAA 
TTTGGCTAATGTTACTAGTTTTGGAGATTATAACCTTTCTAGTGTTTTACCTCAGAGAAACATTCATTCAAGCG 
TATAGCAGGACGTAGTGCTTTGGAAGATTTGTTGTTTAGCARAGTTGTTACATCTGGTTTGGGTACTGTTGATGT 
TGACTATAAGTCTTGTACTAAAGGTCTTTCTATTGCTGACCTTGCTTGTGCTCAGTACTACAATGGGATAATGGT 
TTTGCCAGGTGTTGCTGATGCTGAACGTATGGCCATGTACACAGGTTCTCTTATAGGTGGCATGGTGCTCGGAGG 
TCTTACATCAGCAGCCGCCATACCTTTTTCTTTGGCACTGCAA.GGA.CGACTTAACTATGTTGCTTTACAAACTGA 
TGTGCTTCAAGAAAATCAGAAAATTTTGGCTGCATCATTTAATAAGGCTATTAATAATATTGTTGCTTGTTTTAG 
TAGCGTTAATGATGCTATTACACATACTGGR.GAGGCTATACATACTGTTACTATTGCACTTAATAAGATTGA 
TGTTGTTAATCAACAGGGTAGTGCTCTTAACCATCTCACTTCACAA.TTGAGACA.T7^TTTTGAGGCCATTTCTAA 
TTC^TTCATGCTATTTATGACCGGCTTGATTCAATTCAAGCCGATCAACA^ 
GCTTGCAGCTTTGAATGCATTTGTTTCCCAAGTTTTGAATAAftJA 

ACAGCAGAAGATTAATGAATGTGTCAAGTCACAft.TCTAATAGATATGGTTTTTGTGGCAATGGCACTCACATCTT 



^CAATCGTCAACTCAGCTCGAGATGGTTTGCTTTTTCTT 
AA&GGCGTGGTCTGGTATCTGTGTTGATGGCATTTATGGCTATGTTC 
TGATAATGGTGTCTTTCGTGTAACTTCC^ 
AATATATAATTGTAATGTTACTrTTO^^ 

TGTTAATAAAACATTACAAGAGTTTGCACAAAACTTACC^ 

TAATTTAAC^TATCTTAATTTGAGTTCTGAGTTGAAGCAACTCGAAGCTAAAACTG^TA 

Hypothesised ORFs 

>~ont: 3 to 2357: Frame 3 785 aa 

VKKGCFTGVDGELPVAVYtTOKVFVRYGDVDNLVFTNKT^ 

^^VSKTTLGGLHLLISQFRLSKMGVLKADDFVTASDTTIiRCCTVT^ 

SDVISTOMVIjSIiIKSGRLLIiRNSGRFGGPSNHLVSTK v v v xuj\. 

>~out: 277 to 438: Frame 1 54 aa 

VVLFVQNMQ I C I KNMLRH 1 1 HLHRL VLTFGYH I VLMF 1 1 CGKFLLKL I YKVLKI 
>~out: 457 to 618: Frame 1 54 aa 

KKGVLLVLMVS YLLQLLTTKFIjFAMAMIiTTWFLQ I KQHCLLMLLIiNCLQNEKWV 
>~out: €22 to 852: Frame 1 77 aa 

HHHCLFSKILVIiliHINIiFYGIM^^ 



KI 



TLFYFIiLLS: 



>-out: 937 to 1149: Frame 1 71 aa 

^ou? F 13^t^ 

I INLIGGCCGVKITTCPLFIHSC^ IMVLVLSCLW 
>~out: 1738 to 1935: Frame 1 66 aa 

SLIMISMIMIiVMQILALQVIVIiLFTIiKI SLTYLFLICMMVELNFVMVKTSIjKMVFLIiI LMVLLEKN 
>-out: 2357 to 6142: Frame 2 1262 aa " ' 

^QDGFY£^U3D1^PETYVAI,PIYYQH^^ 
^S?^^^ G ^ PGDS SWHI YIiKSGTCPFSFSKLNNFQKFKTI CFSTVEVPGSC^^PLEA^WffiraS YT]^GALYVTM< 

EGNSITGVPYPVSGIREFSl^VIiNNCTKWIYDYVGTGIIRSSNQSIiAGGITYVSNSG 
PCTOQPDQVAVYQQSIIGAMTAVNESRYGLQNL^^ 

dngisaiitanlsipsnwttsvqveyi^itstpivvd^^ 

^VSS^TFDSNAFSIJmVTSFGDYl^SSVLPQR^ 
VFLimQLLSQVCCQSIGF^IRVHL^ 

>~out: 2781 to 2954: Frame 3 58 aa imniftiwibbiKyi 

^YRVKLYVCIYIMQLVLFMCRPLIN^^ 
>~out: 3126 to 3296: Frame 3 57 aa 



>~out: 3546 to 3806: Frame 3 87 aa 

>~out: 3810 to 3986: Frame 3 59 aa 

I LLQLHLLVVLVMFVNHARLI YLLMVTLQCVLEHLI FQLGI FITALRWHOVTLHGI F T 
>~out: 4026 to 4217: Frame 3 64 aa 

I IFKSLRI^SQPSKCLVWIFHLK^ 

>~out: 4227 to 4376: Frame 3 50 aa J-v* vt>iiVI 

I XVPNI I FMIMLVLELYVLQTSHIiLVVIiHMFIiTLVIYIiVIiKMFPIjVTFIiL 
>~out: 5157 to 5447: Frame 3 97 aa 
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VAWCSEVIjHQQPP YLFLWHCKHDI1TMI1LYKLMCPKKI RKFWLHHIj IRIiLI HjLLLLVAXiMMIjIjHI IiQRLYILLLL 
HLIRPRMLLINRWIiLTISLHN 

>~OUt: 5625 to 5774: Frame 3 50 aa 

HSRRIiMNVSSHl^IDMVFVAMALTSFQSSTQLQMVCFFFILFCCQIiITRM 

>~out: 5874 to 6065: Frame 3 64 aa 

LPGSCFNIiVYLFCLILCKYIIVMLLLLT^ 

Alignment 



> gj 1 12175747 1 ref | NP 073549.1 1 replicase polyprotein lab [Human coronavirus 229E] 

1 80179827 1 an | Q05002 1 RLAB CVH22 Replicase polyprotein lab (pplab) (ORFlab polyprotein) [Includes: 
Replicase polyprotein la (ppla) (ORFla)l [Contains: p9; 
p87; pl95 (Papain-like proteinases 1/2) 
1 5 (FL1-PRO/PL2-PRO); Peptide HD2; 3C-like proteinase 

(3CL-PRO) (3CLp) (M-PRO) (p34); Unknown protein 1; p5; 
p23; pl2; Growth factor-like peptide (GFL) (pl6); 
RNA-directed RNA polymerase (RdRp) (Pol) (plOO); Helicase 
(He!) (p66) (p66-HEL); Unknown protein 2; p41; Unknown 
2 0 protein 31 . 

m 1 12082740 1 gb [ AAG48591.1 1 replicase polyprotein lab [Human coronavirus 229E) 
Length = 6758 

Score = 1332 bits (3448), Expect = 0.0 
2 5 Identities = 630/789 (79%), Positives = 695/789 (88%), Gaps « 4/789 (0%) 
Frame = +8 

Query- 3 WCNVDMYPEFSIVCRFDTRTRSVFN^ 182 
WNCNVDMYPEFSIVCRFDTRTRS NLEGVNGGSLYVN HAFHTPAYDKRA KLKP PF 
30 Sbjct: 5970 WNCNTOMYPEFSrVCRFOT 6029 

Query- 183 FYFDDSDmWQEQWYVPLRASSCvTOCNIGGA 362 

FY+DD C+W +QVTm7PIiRA++C+T+CNIGGAVCSKH2^NIjY-»- YVE+YN FTQAGFNI 
Sbjct: 6030 FYYDDGSCEVVTDX2vTOYVPL 6089 

35 

Query: 363 OTPHSFDVYI^WQIFIETl^SLEN^ 542 

OTP +FD YNLWQ F E NLQ LENIAFNW KG F G DGELPVA+ DKVFVR G+ D 
Sbjct: 6090 OTPTTFDCraLWQTFTEvmQGLEN^ 6149 

40 Query: 543 NI* vTTNKTTLPTNVAFELFAKRKMGIiTPPLS I LKE^GVVATYKFVliWDYEAERPFTS YTK 722 
NLVF NKT+LPTN+AFEIiFAKRK+GLTPPLSILKl^GWATYKFVIiWDYEAERP TS-t-TK 
Sbjct: 6150 NLVT'VNKTSLPTN^FELFAKRK^G^ 6209 

Query: 723 SVCKYTDFNED VCVCFDNS IQGS YERFTLTTNAVTiFSTWI K NLTPIKLNFGMLNG 890 

•45 " SVC YTDF EDVC C+DNS IQGSYERFTL+TNAVLFS +K +L IKLNFGMIiNG 

Sbjct: 6210 SVCGYTDFAEDVCTCYDNSIQGSYERFTLSTNAVIjFSATAVKTGGKSLPAIKL 6269 

Query- 891 MPVSSIKSDKGVEKLvTS^XYTOKNGQFQDHYTC 1070 
++++ KS+ G K +NW+ YVRK+G+ DHYDGFYTQGKNL DF PRS ME DFIjNMD+ 
50 Sbjct: 6270 NAIArn^EDGNIKNINWFVYVRKDGKPTOHYDG 6329 

Query: 1071 GVF INKYGIiEDFNFEHVAnTGDVSKTTLGGLHLIj I SQFRLS KMGVLKADDFVTASDTTLRC 1250 

GVFI KYGLEDFNFEHVVYGDVSKTTLGGLHLI*ISQ RLSKMG+LKA++FV ASD TIi+C 
Sbjct: 6330 GVF I QKYGIiED FNFEHWYGDVS KTTLGGLHLIi I SQ VRLS KMGI LKAEEFVAASDI TLKC 6389 

55 

Query: 1251 CT VTYIilTOLS SKVVCTYMDLLIiDDFVTILKSLDLGVI S KVHEVI IDNKPYRWMLWCKDNH 1430 

CTVTYLN+ SSK VCTYMDLLLDDFV++LKSLDL V+ S KVHEVI I DNKP+RWMLWCKDN 
Sbjct: 6390 CTVTYLNDPS S KTVCTYMDLIxLDDFVSVLKSLDLTVVS KVHEVI I DNKPWRWMliWCKDlSrA 6449 

60 Query: 1431 LSTFYPQLQSAEWKCGYAMPQIYKLQRMCLEPCNLYNYGAG 1610 
++TFYPQLQSAEWKCGY+MP IYK QRMCLEPCNLYNYGAG+KLPSGIM NWKYTQLCQ 
Sbjct: 6450 v7VTFYPQLQSAEWKCGYSMPGIYKTQRMCLEPCNLYN^ 6509 

Query: 1611 YLNS TTMCVPHNMRVLHYGAGSDKG V7^GTTVLKRWIj P PXXXXXXXXXXXYVSDADF SIT 1790 
65 - y xJSTT+CVPHNMRVLH GAGSD GVAPGT VLKRWLP YVSDADFS+T 

Sbjct: 6510 YFNSTTLCVPHNMRVLHLGAGSDYGVAPGTAVLKRWIjPHDAIWDND 6569 

Query: 1791 GDCATV^IjEDKFDLLISDMYDGRIKFOT 1970 
GDCATVYLEDKFDLLISDMYX)GR K DGENVSK+GFFTY+NG I EKLAIGGS+AIK+T 
70 Sbjct: 6570 GDC^TVYXiEDKFDIiLISDMYDGRTKAIDGENVSKEGFFTYINGFI CEKLAIGGS I AI KVT 6629 

Query: 1971 EYSWNKYIiYELI QRFAFWTLFCTSVNTSS SEAFIiIG INYLGDFI QGPF IAGNTVHANYI F 2150 
EYSWNK LYEL+QRF+FWT+FCTSVNTSSSEAF++G1NYLGDF QGPFI GN +HANY+F 
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30 



50 



Sbjct: 6630 EYSWNKKLYELVQRFSFWTMFCTSVOTSSSEAFW 6689 

Query: 2151 WRNSTIMSLSYNSVLDLSKFECKHKATV^ 2330 

WRNST+MSLSYNSVLDLSKF CKHKATWV LKDSD+N+MVT1SL++SG+LL+R +G+ 

Sbjct: 6690 WRNSTVMSLSYNSVLDLSKFNCKHKATVWQLKDSDI^ 6749 



Query: 2331 FSNHLVSTK 2357 

FSNHLVSTK 
Sbjct: 6750 FSNHLVSTK 6758 

>gj 1 13604332 1 gb I AAK32188.1 1 spike glycoprotein [Human coronavirus 229E] 
Length = 1178 



1 5 Score = 1891 bits (8600), Expect - 0.0 

Identities = 682/1069 (63%), Positives = 833/1069 (77%), Gaps = 7/1069 (0%) 
Frame = +2 

Query: 2948 GRI VNYTVCDDCNGYTDNI FSVQQDGRI PNGFPFNNWFLLTNGSTLVDGVSRLYQPLRLT 3127 
20 O +Y+VC+ C GY++N+F+V+ G IP+ F FNNWFLLTN S++VDGV R +QPL L 

Sbjct: 21 GLNTSYSVC^GCVGYSENVFAVESGGYIPSDFAFNNW 80 

Query: 3128 CXWPVPGLKSSTGFVTENATGSDVNCNGYQHNSVADv^^ 3307 
CLW V GL+ +TGFVYFN TG +C G+ +' ++DV+RYWLN ^6 1^ 

25 Sb]Ct: 81 CLWSVSGLRFTTGFVYFNGTGRG-DCKGFSSDVLSDVIRYNLNFE ENLRRGTILFK 135 



Query; 3308 TLQYDVXFYCSNSSSGVTiDTTIPFGPSSQPYY 3407 

T V+FYC+N++ D IPFG +YCF+N+TI S FVG LP TVRE V+ 

Sbjct: 136 TSYGVWFYCTNimiVSGDAHIPFGTvXGNFYCF\^ 195 ' 

Query: 3488 ARTGQFYINGFKYFDLGFIEAVNFNVTTASATDFWTOAFATFVDVLVWSATN 3667 

+RTG FYING++YF LG +EAVNFNVTTA TDF+TVA A++ DVLVNVS T+I N++YC 
Sbjct: 196 SRTGHFYINGYRYFTLGliTVEAvTSrFNV^ 255 

35 Query: 3668 DSPFEKLQCEHLQFGLQDGFYSANFLDDNVLPETYVALP ASFGG 3838 

+S +L+C+ L F + DGFYS + + LP + V+LP+Y++HT X S GG 

Sbjct: 256 NSVINRLRCDQLSFDVPDGFYSTSPIQSvTSLPVSIVSLPVT^ 33.5 

Query: 3839 SCYVCKPRQVNISL-NGNTS---VCWTSHFSIRYIYmVKSGSPGDSSWHIYLKSGTCP 4006 
40 C+ C P VNI+L N N + +CV TSHF+ +Y+ G W + +G CP 

Sbjct: 316 KCFN CYPAGVNITLANFNETKGPLCVDTSHFTTKYVAVYANVGR WSA3INTGNCP 370 

Query: 4007 FSFSKLl^FQkFKTICFSTVKVPGSCNFPLEATWHYTSY^I 4186 
F SP K+NNF KF ++CFS ++PG C P+ A W Y+ Y +G+LYV+WS+G+ ITGVP 
45 Sbjct: 371 FSFGKVNNFV1CFGSVCFSLKDIPGGCAMPIVANWAYSKYYTIGSLYVSW 430 

Query: 4187 PVSGIREFSNLVLNNCTKOTIYDYVGTGIIRSSN 436ff 

PV G+ F N+ L+ CTKYNIYD G G+IR SN + GITY S SGNLLGFK+V+ G 
Sb3Ct: 431 PVEGVS S FMNVTLDKCTKYNT YDVSGVGVI RVSNDTFLNGI TYTSTSGNLLGFJKDVTKGT 490 

Query: 4367 I FI VTPCNQPDQVAVYQQSI IGAMTAV^raSRYGLQNLLQLPNFYYVSNGG^CTTAVMl Y 4546 

1+ +TPCN PDQ+ VYQQ+++GAM + N + YG N+++LP F+Y SNG NOT AV+ Y 
Sbjct: 491 IYSITPCNPPDQLVVYQQAWGAMLSENFTSYGFSNVV^ 550 

55 Query: 4547 SNFGICADGSLIPVRPRNSSDNGISAIITANLSIPSNWTTSVQvTSYLQIT^ 4726 
S+FG+CADGS+I V+PRN S + +SAI+TANLSIPSNWTTSVQVEYLQITSTPIWDC+T 
Sbjct: 551 SSFGVC^GSIIAVQPIUWSTOSVSAIOT^ 610 

Query: 4727 YVCNGNPRCKNLLKQY^SACKTIBDALRLSAHLETNDVS SMLTFDSNAFSLANVTS FGD Y 4906 
60 YVCNGN RC LLKQYTS ACKTI EDALR SA LE+ DVS MLTFD AF+LANV+SFGDY 

Sbjct: 611 YVCNGNVRCVELLKQYTSACKTI EDALRNSARLESADVS EMLTFDKKAFTLANVSSFGD Y 670 

Query : 4907 NLSSVLPQIWraSSR^^ ' 50 qq 

. NLSSV+P SR+AGRSA+ED+LFSK+VTSGLGTVD DYK+CTKGLSIADLACAOYY 

65 Sb 3 ct: 671 I^SSVIPSLPTSGSRVAGRSAIEDILFSKIOTSGLGTVE&DYKNCTKGLSIAD 730 

Query: 5087 NGirWLPGVADAERMAMYTGSLIGG]^ 5266 

NGIMVLPGVADAERMAMYTGSLIGG+ LGGLTSA +IPFSLA+QARLNYVALQTDVLOEN' 
Sbjct: 731 NGIMVLPGVADAERMAMYTGSLXGGIALGGLTSAVSIPFSIAIQARLN^ 790 

Query: 5267 QKILAASFNKAINNIVASFSSVNDAITHTAEAIOT 5446 

QKILAASFNKA+ NIV +F+ VNDAIT T++A+ TV ALNKIQDVVNQQG+ +LNHLTSO 
Sb3Ct: 791 QKILAASFNKAMTNIVDAFTGVIJDAITQTSQALQT^ 850 

75 Query: 5447 LRHNFQAXSNS IHAI YDRLDS IQADQQVDRLI TGRIiAALNAFVSQV^ 5626 



LR NFQAIS+SI AIYDRLD+IQADQQVDRLITGRLAALN FVS L KYTEVR SR+LA 
Sbjct: 851 LRQNFQAISSSIQAIYDRLDTIQADQQVDRLITC 910 

Ouerv- 5627 QQKSNECVICSQSimYGFCGNGTHIFSIWS 5806 
Query. 56,2/ U^^^^g RYGFC( 3NGTHIFSIVN+AP+GL+FLHTVLLPT yk+v+awsg+cvdg 
Sbjct: 911 QQKVNECVKSQSKRYGFCGNGTHI FS I VNAAPEGLVFLHTVLLPTQYKDVEAWSGLCVDG 970 

Ouerv- 5807 I YGYVLRQPNLVLYSDNGVFRWSRVMFQPRLPVLiSDFVQI YNCNVTFVNI SRVELHTVI 5986 
Query. S>bU/ xx ^^« pNL LY + +R+T SR+MF+PR+P ++DFVQI NCNVTPVNISR EL T++ 
Sbjct: 971 TNGYVLRQPNLALYKEGNYYRITSRIMFEPRI PTMApFVQIENCNVTFVNI SRSELQTIV X030 

Ouerv- 5987 PDYTOWKTLQEFAQNLPKYV^PNFDLT^ 6133 

P+Y+DVNKTLQE + LP Y P+ + +N T LNL+SE+ LE K+A 
Sbjct- 1031 PEYIDVNKTLQELSYKLPNYTVPDLVVEQYNQTILNLTSEISTLENKSA 1079 



6. Sequence F , . ^ m , _ . 

3062 Nucleotides encoding putative 3' end of Spike, hypothetical nsp 3, Envelope protein 5B, Matrix and 

AGcS 

TATTTCTGTTGTTOT^ 

CAATTGTTTAACTTCATCAATGCGAGGCTG 
GGTTCCACGTTCAATAATGC^^ 
imTCTCAAATTACCACCTCATGATGTTAC^ 
ACTGCTTATTTCTTAGTT^^ 

GCTTGTTTTGTTTTAAAACTATTGACACTATCT^^ 

AGTTTTATAATTTTTTTTCTACGCTGTTG 

xTTTCATTTGTTTTGTTCAATGTTACTAAACT 

ATXSaAAATGOTTTTGCTro 

CTTTTGATC^CCTTTATGTTC^ 

ATAATGGTGCTGTCATTTAGAOT 

ATGTTCCTTCGATTAATTGATGACAATGGCATTGTCCT 

TTTGTGTTGGCAATGACCTTTATTAAACT^ 

TATCAACCAGTTTATAAAATTTTTC^^ 

AATGTCTAAAOTAAACGATGTCTAATAGTAGTGTG 

TTAGTTGGAATTTAATTCTAACAGxTTTO 

ATGGTTTAAAGATGTCTCTTOTATCGTGTTTATGGCCACT 

ATTTTAATGTGGACTGGGTCTTTOT 

ATTTTGTTAATAGTTTCAGACTTTGGCGCCGT 

TCTCTCTCCAGGTxTATGGACATAATTATTACra 

TOAGTGGTGTACTTCTTGxTGATGGCCATAAGAOT 

TAGTTGCTACACCTAGTACCACAATTGTTTGTGACCGTGTTGGTCGCTCTGTO 

GGGCATTCTACGTCCGTGCTAAACATGGTGATTT^^ 

AGAAGTTGCTTCATTTAATCTAAACTA^ 

AAATTTCCTCCTCCTTCATTTTACATGCCT 

AATCTTGTCCCTATTGGTAAGGGTAATAAAGAT 

CGCAGGGGGGAACGTGTTGATTTGCCTCCTAA 

AAATTCAGACAACGTTCTGATGGTGTTGTTO^ 

AATCGCAAACGTAATCAGAAACCTTTGGA^^ 

TTTGAGGATCGCTCTAATAACTCATCTCGTGCTO 

CGTAGTACTTGAAGACAACAGTCTCGCACTCGTTCTGATTC 

ACTTTGGCTTTAAAGAACTTAGGTTTTGATAAC 

AAACCTAATAAGCCTCTTTCTCAACCGAGGGCTGATAAGCCTTCTCAGT 

CCTAC^GAGAGGAAAATGTTATTC^^ 

GTTCAGAATGKSTGTTGATGCCAAGGGT^ 

GATAGTGAGGTTAGCACTGATGAAGTGGGTGATAATGOT 

GATAATAAGAACCTTCCTAAGTTCATTGAGCAGATTAGTG 

TCAGAATCATCTCATGTTGCTCAGAACACAGT^ 

GATTC^GCCATTATAGAAATTGTCAACGAGG^^ 

TATTAGTTGCAACNCCCATGGTTTAGCGCAT^ 

Hypothesised ORFs 

>~out: 17 to 238: Frame 2 74 aa 

FEIiLNRFElTCIKWPWWWLIISWFVv^ 

>~out: 223 to 723: Frame 1 167 aa 




FMKIVLLLFMVVTTMSF 

>~out: 525 to 917: Frame 3 131 aa 

SFDDLYVAIRGSCEKNLQLMRKVDLVNGAVIYIFAEEPWGIVYSSQLYEDVPSIN 
>~out: 877 to 1131: Frame 1 85 aa 

>~out: 1140 to 1820: Frame 3 227 aa 

?^SSVPLSEVYVHLRNWNFS^ 

WVFFGFS ILMS I ITLCLWVMYFVNSFRLWRRVKTFWAFNPETNAI I SLQVYGHNYYLPVMAAPTGVTLTLLSGY 

LVDGHKIATRVQVGQLPKXVIVATPSTTIVCDRVGRSVNETSQTGWAFYVRAKHGDFSGVASQEGVLSEREKQI 
LI 

>-out: 1324 to 1539: Frame 1 72 aa 

LmFLTVLSILMWTGSFLVLWLCLL^ I ITYR 

>-out: 1654 to 1815: Frame 1 54 aa ^omuuhj.ajk 

LLHLVPQLFVTVLVAIiIjMKQARLVGHSTSVLNMVIFIiVIjPLRRVFCQKERSCFI 
>~out: 1819 to 2964: Frame 1 382 aa 

SKIiNKMASVNWADDRAARKKFPPPSFYMPLLVSSDKAPYRVIPRNIjVPIGKGNKDEQIGYWNVO 
pLPPKVHFYYLGTGPHKDLKFRQRSDGVVWVAKEGAKTVNTSLGOTUOW 

NSSRASSRSSTRMMSRDSSRSTSRQQSRTRSDSNQS'SSDLVAAVTIALKNIjGFDNQSKSPSSSGTSTPKKPNKE 
SQPRADKPSQLKKPRWKRVPTREENVIQCFGPRDFNHNMGDSDLVQNGVDAKGFPQLAEIilPNQAALFFDSE^ 

IVNEVLH 

>~out: 1847 to 2074: Frame 2 76 aa 



xo*fc/ to AVfQz Frame 2 76 aa 

I GPMTEIiLGRNFLLLHFTCLFWLVIi IRHHI GSFPGI LSIiLVRVI KMSRLVIGMFKSVGVCAGGNVLI CLLKFI F 

>~out: 2078 to 2410 : Frame 2 111 aa 

>~out: 2771 to 2938: Frame 2 56 aa 

LRI IRTFLSSLSRLVIJuLNPVLSKKCS 

Alignment 

>gi 1 ^.3604336 1 gb | AAK32190.1 1 spike glycoprotein [Human coronavirus 229B1 
Length » 1173 

Score a 50.4 bits (119), Expect = 7e-06 
Identities = 26/71 (36%), Positives = 31/71 (43%) 
Frame a +2 

Query: 26 LNRFENYIKWPWDCXXXXXXXXXXXXX 205 

XjNR E YIKWPW S+RGCC-l- STKL 

Sbjct: 1105 LmVETYIKWPWWVWLCISWLIFWSMI^LCCCSTGCCGFFSCFASSI n 6 2 

Query: 206 PYYEFEKVHVQ 238 

PYY+ EK+H+Q 
Sbjct: 1163 PYYDVEKIHIQ 1173 

>gi 1 12175749 Iref | NP 073552 1 1 4a protein [Human coronavirus 229EJ 
g i \ 138983 1 sp 1 P19739 1 VN4A CVH22 Nonstructural protein 4a (ORF4a) 



gil74871|nir| 



gj[B8Q2a|ftinK 



JffiSBSHfi nonstructural protein 4 - human coronavirus (str ain 229E) 



CAA33682.1 [, unnamed protein product [Human coronavirus 229E] 



^ |-12Q92742 1 gb [ AAG48g93q [ "4a protein [Human- ebfonavinis 229E1 

T nn^U _ 1 OO 



Length a 133 

Score « 71.6 bits (174), Expect(2) a le-17 
Identities = 41/95 (43%), Positives = 56/95 (58%) 
Frame a +1 



Query: 253 GLFQLTLESTINKSVMTLKLPPHDVTV^ 

GLF L L S +N+S++N K+ + ++K T + AY L+SLFV YFAMK 

Sb^ct: 4 GLFTLQLVSAVTffQSLSimKVSAEVSRQV^^ 6 3 

Query: 433 TARGRVACF VXKLLTIiSVYVPLL VXiFGMYLDSFI I 537 



RGR A V K+L I, VYVPLL Y+ + +1 



Sbjct: 64 SHRGRAALIVFKILILFVYVPLLYWSQAYIYATLI 98 



Score = 40.4 bits (93), Expect(2) « le-17 
Identities = 15/30 (60%), Positives = 22/30 (73%) 
Frame = +3 

Query: 549 LL FRF I HVGYYAYL YKNF SFVIiFJIVTKIiCF 638 

LL RF H ++ +LYK + F++FNVT LC+ 
Sbjct: 102 LLGRFFHTAWHCWLYKTWDFIVFNVTTLCY 131 



>gj | 12175750 |ref|NP 073553.11 4b protem [Human coronavirus 229E] 

gj 1 138992 1 an | P19740 1 VN4B CVH22 Nonstructural protein 4b (Nonstructural protein 5A) (ORF4b) 
gjj748721pir| [MNIHH2 nonstructural protein 5A - human coronavirus (strain 229E) 
gj 1 58924 1 emb | CAA83683.1 1 unnamed protein product [Human coronavirus 229E] 
gj 1 12082743 1 gb | AAG48 594.1 1 4b protein [Human coronavirus 229EJ 
Length = 88 

Score = 86.7 bits (213), Expect = 2e-16 
Identities = 38/80 (47%), Positives « 54/80 (67%) 
Frame » +1 

Query: 640 VS GKCWYLEQSFYENRFAAI YGGDHYWLGGETI TFVS FDDLYVAI RGS CEKNLQLMRKV 819 

+ GKCW+LE + F YGGD ++ +G +++ S +DLYVA+RG +K+L L RKV 
Sbjct: 1 MQGKCWFLENKALKP - FVCFYGGDQFLYIGDRI VS YFS TNDLYVALRGRIDKDLSLSRKV 59 

Query: 820 DLYNGAVT YI FAEBPWGI V 879 

+LYNG +Y+F E P VGIV 
Sbjct: 60 ELYNGECVYLFCEHPAVGIV 79 

>gj[ 12175751 1 ref | NP 073554.1| envelope protein [Human coronavirus 229EJ 
g j |138994|sn|P19741IVEMP CVH22 Envelope protein (Protein 5B) 
gj|74873|nir| | MNIHH3 nonstructural protein 5B - human coronavirus (Btrain229E) 
gj [ 58925 [ emb | CAA33684.1 1 unnamed protein product [Human coronavirus 229E] 
gj 1 12082744 1 eb | AAG48595.1 1 envelope protein [Human coronavirus 229E] 
Length = 77 

Score = 87.8 bits (216), Expect = 3e-17 
Identities « 36/76 (47%), Positives = 55776 (72%) 
Frame = +3 

Query: 901 MFLRLIDDNGIVIiNSILWIjLVMIFFFV 1080 

MFL+L+DD+ +V+N +LW +V+I ++ +T IKLI+LCFTCH F +RT+Y P+ ++ 
Sbjct: 1 MFLKLVDDHALVVNYLLWCVVIi I VI LLVC ITI I KLIKLCFTCHMFCNRTVYGP IKNVYHI 60 

Query: 1081 YQDYMQIAPVPAEVLN 
YQ YM I P P V++ 
Sbjct: 61 YQSYMHIDPFPKRVID 76 



>gi|74837|pir| |MMIHHC El membrane glycoprotein - human coronavirus (strain 229E) 
gj 1 329573 1 gb | AAA45461.1 1 membrane protein [Human coronavirus f 
Length = 225 

Score « 275 bits (703), Expect = 4e-72 
Identities = 128/224 (67%), Positives » 159/224 (70%) 
Frame = +3 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



1143 MSNSSVPLSEVYVHLRNWNFSW]^I&^ 1322 

MSN + ++ HL+NWNF WN+ILT+FIV+LQ+GHYKYSRLLYGLKM VLW LWPLVL 
1 MSNDNCT- GDIVTHLKNWNFGWNVILTI FI VILQFGHYKYSRLLYGLKMLVLWLLWPLVL 59 

1323 ALSIFDCFWFNVDWVFFGFSILMS1ITLCLWVMYFVNSFRLWRRVKTFWAFNPETNAII 1502 

ALSIFD + N++ +W F FS+LM++ TL +WVMYF NSFRL+RR +TFWA+NPE NAI 
60 ALSIFDTWANWDSNWAFVAFSLLMAVSTLVMWVMYFANSFRLFRRARTFWAWN 119 

SLQVYGHNYYL^ AP H++A+ VQV LP+Y+ VA PSTTI+ 

120 VTTVT^GQTYYQPIQQAPTGITATCLLSGVLYVD^ 179 



15 



20 



Query: 1683 DRVGRSVNETSQTGWAPYVRAKHGDFSGVASQEGVLSEREKLIiH 1S14 
oW4 + n _ RVGRSVN + TGW PYVR KHGDPS V+S ++E E+LLH 
Sbjct: 180 SRVGRSVNSQNSTGWVFYVRVKHGDFSAVSSPMSNCOTENERLIiH 223 



gi 



L^aqI?! 1 ^!^ PgSfiSgJ I nudeocapsid protein [Human coronavirus 229E] 
77063 1 pir | |gQgg33 nudeocapsid protein -human coronavirus 



■ ■» § 9 ■ i n- — m ——■—-'«"«''*' j/iuvbiu — Human wjimiavif uh 

ff^f^^^S l , mma T d ^ ro \ ehl P^duct [Human coronavirus 229E] 



i otvio>7AR i AiA An7 OC n H It i — , pxuuuuu Lxiuman coronavirus 22! 

12082746 1 gb | AAG48597.1 1 nudeocapsid protein [Human coronavirus 229E] 



10 " Length = 389 



Scores 267 bits (682), Expect = le-69 

Identities = 159/406 (39%), Positives « 222/406 (64%), Gaps = 31/406 (7%) 
Frame = +1 



Query: 1834 " "DI^^PPPSPYMPLLVSSDKAPYRVIPRKLVPIGKGNKDEQIGYWN 2004 

sbjct: x SSSJS^S^^ 59 ° 4 

Query: 2005 Jg---5«5^^ 2184 
Sbjct: 60 VQ^RTRKGKRVDLSPKLHPYYLGTGPHKDAKFRERVEG^ATOG^PTCYGV^ 119 
25 QUSry: 2185 ^ Q ™^IALPPELS^F^^ 2364 

Sbjct: 120 KNSEPEIPHFNQKLPNGSVTVVEEPD SRAPSRSQSRSQSRGRGESKPQSRNPSSDR 174 

Query: 2365 XXXXXXXL^VTI^GFm- - - QXXXXXXXXXXXXXXXXXXLS 2496 

30 Sbjct: 175 NHNSQDDIMKAVAAALKSLGPDKPQEKDKKSAKTGTPKPSRNQSPASSQTSAKSLARSqI 234 

Query: 2497 QPRADKPSQLKKPRW^VPTRE - - ENVTQCFGPRDFNHNMGDSDLVQNGVDAKGFPQIiAE 2670 
35 Sbjct: 235 B~A££SS£ m £^^ J' 
Query: 2671 ^^^fVSTOEVTO^ 
Sbjct: 295 ^TA^I^V^^ 
40 Query: 2851 SQSSHVAQMTVlNaSIPE SKPLADDDSAI IEIWEV 2958 

I 14» -4- Tim G IS! . . « , ^ • _ 



2850 
--REMQ 349 



Sbjct: 350 Q^L^PSAiipNPSQTSPATA^P^^iETO^EV 388 



45 



4. 
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